发布于2024-10-31 21:03 阅读(576) 评论(0) 点赞(26) 收藏(3)
I am querying a database in InfluxDB. The database has the date in timestamp format. With this code I make the query
query = f"select * from {measurement} where time > '2021-03-28T02:02:58Z' AND time <
'2021-03-28T02:05:00Z'"
result = client.query(query)
df_pandas = pd.DataFrame(list(result.get_points()))
df_pandas.head(20)
Then I do some transformations
df_pandas_filtered = df_pandas[['time', 'idDispositivo', 'idSensor', 'valor']]
df_pandas_filtered.loc[:,'time'] = df_pandas_filtered['time'].str.slice(0, 19)
df_pandas_filtered.loc[:,'time'] = pd.to_datetime(df_pandas_filtered['time'], format='%Y-%m-%dT%H:%M:%S', errors='coerce')
df_pandas_filtered['time'] = df_pandas_filtered['time'].astype("datetime64[ns]")
df_pandas_filtered.loc[:,'idDispositivo'] = pd.to_numeric(df_pandas_filtered['idDispositivo'],errors='coerce').astype('Int64').replace({pd.NA: None})
df_pandas_filtered.loc[:,'idSensor'] = pd.to_numeric(df_pandas_filtered['idSensor'],errors='coerce').astype('Int64').replace({pd.NA: None})
df_pandas_filtered.loc[:,'valor'] = pd.to_numeric(df_pandas_filtered['valor'],errors='coerce').astype('Int64').replace({pd.NA: None})
df_pandas_filtered.head(20)
The error occurs when I transform to a pyspark dataframe
spark_df = spark.createDataFrame(df_pandas_filtered, schema=schema)
spark_df.show()
NonExistentTimeError Traceback (most recent call last) Cell In[61], line 2
1 # Convierte el Pandas DataFrame en un DataFrame de PySpark
----> 2 spark_df = spark.createDataFrame(df_pandas_filtered, schema=schema)
3 spark_df.show()
File C:\spark-3.5.0-bin-hadoop3\python\pyspark\sql\session.py:1440, in SparkSession.createDataFrame(self, data, schema, samplingRatio, verifySchema) 1436 data = pd.DataFrame(data, columns=column_names) 1438 if has_pandas and isinstance(data, pd.DataFrame): 1439 # Create a DataFrame from pandas DataFrame.
-> 1440 return super(SparkSession, self).createDataFrame( # type: ignore[call-overload] 1441 data, schema, samplingRatio, verifySchema 1442 ) 1443 return self._create_dataframe( 1444 data, schema, samplingRatio, verifySchema # type: ignore[arg-type] 1445 )
File C:\spark-3.5.0-bin-hadoop3\python\pyspark\sql\pandas\conversion.py:362, in SparkConversionMixin.createDataFrame(self, data, schema, samplingRatio, verifySchema)
360 warn(msg)
361 raise
--> 362 converted_data = self._convert_from_pandas(data, schema, timezone)
363 return self._create_dataframe(converted_data, schema, samplingRatio, verifySchema)
File C:\spark-3.5.0-bin-hadoop3\python\pyspark\sql\pandas\conversion.py:474, in SparkConversionMixin._convert_from_pandas(self, pdf, schema, timezone) ... File c:\Users\raidel.rodriguez\.conda\envs\data_summary\Lib\site-packages\pandas\_libs\tslibs\tzconversion.pyx:177, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc_single()
File c:\Users\raidel.rodriguez\.conda\envs\data_summary\Lib\site-packages\pandas\_libs\tslibs\tzconversion.pyx:417, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc()
NonExistentTimeError: 2021-03-28 02:02:58
I know it has to do with the change of summer time in my time zone (Central Europe). I have checked that on the date 2021-03-28 the time change was made in Spain. But I don't know how to fix it and I need help Thanks in advance
作者:黑洞官方问答小能手
链接:https://www.pythonheidong.com/blog/article/2040410/7229740b033623905cb6/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!