程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

如何从一个直方图中的多个文件获取数据?[复制]

发布于2023-11-18 10:22     阅读(497)     评论(0)     点赞(28)     收藏(5)


为了进行数据分析,我在各个直方图中显示各个文件夹中的数据。为了更好地了解测量值的分散情况,我现在想在单个直方图中显示文件夹中的所有数据。不幸的是,这根本超出了我的技术能力。问题是这段代码不是我写的,而是我的一位同事发给我的。我对它做了一些修改,使其适合我的问题,但不幸的是我的 Python 知识还不足以解决更多问题。

这是到目前为止的代码:

# import libaries
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import itertools
import pandas as pd
import scipy.interpolate as interp
import ptu_reader_JHU as ptu
import csv
from scipy.optimize import curve_fit
from glob import glob

# read all .ptu files in folder
files = glob("P:/schlaefke/WRS_Test/New_Data/Multirange_Grid_smallest_Area4/*.ptu")
for file in files:
    
    # create Array with the timestamps of all markers
    header, markertime = ptu.ptu_markertime(file)

    # create array with the distances of the markers
    markertimeshift=np.empty_like(markertime)
    markertimeshift[:1]=0
    markertimeshift[1:]=markertime[:-1]
    markerdistance=np.subtract(markertime, markertimeshift)

    # convert markerdistance into seconds
    markerdistance = markerdistance*header['MeasDesc_GlobalResolution']

    # remove the first marker (sometimes its very fast)
    markerdist_removed=markerdistance[1:]

    # create arrays with long and short distances
    markerdist_group1=markerdist_removed[::2]
    markerdist_group2=markerdist_removed[1::2]

    # calculate variance of the line length
    if max(markerdist_group1)>max(markerdist_group2):
            average = np.average(markerdist_group1)
            variance = np.var(markerdist_group1)
            relvariance = variance/average
            outlier = max((average-min(markerdist_group1), max(markerdist_group1)-average))
            reloutlier = outlier/average
        
    else:
            average = np.average(markerdist_group2)
            variance = np.var(markerdist_group2)
            relvariance = variance/average
            outlier = max((average-min(markerdist_group2), max(markerdist_group2)-average))
            reloutlier = outlier/average

    # find average of long distance
    average_line=max((np.average(markerdist_group1), np.average(markerdist_group2)))

    # determine the number of bins. its 10% of the larger distance (10% of the maximum duration of one line)
    binwidth = average_line/10

    binnr = np.floor((max(markerdistance)-min(markerdistance))/binwidth)

    # create and save plots

    for i in files:
        plt.hist(markerdistance, bins=int(binnr))
        plt.xlabel('Time of markers (s)')
        plt.ylabel('Number of markers')
    plt.show()

     # export data in csv file
    with open("P:\schlaefke\WRS_Test\PTU_Data.csv", "a", newline = "") as f:
        writer = csv.writer(f)
        writer.writerow([variance, relvariance, outlier, reloutlier])

我也尝试过使用seaborn,但也不起作用。我想我尝试调用错误的变量:

import seaborn as sns

data_histogramm = pd.DataFrame(markerdistance, bins=int(binnr))
sns.histplot(data_histogramm, x="Time of markers (s)", y= "Number of markers")

我将非常感谢任何帮助或提示:)


解决方案


您的代码中似乎存在一些问题。

这是更正后的代码:

# import libraries
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
import ptu_reader_JHU as ptu
import csv
from glob import glob
import seaborn as sns

# read all .ptu files in folder
files = glob("P:/schlaefke/WRS_Test/New_Data/Multirange_Grid_smallest_Area4/*.ptu")

# create empty list to store all marker distances
all_marker_distances = []

for file in files:
    # ... (your existing code to process each file)

    # append marker distances to the list
    all_marker_distances.extend(markerdistance)

# determine the number of bins for the combined data
binwidth = average_line / 10
binnr = int(np.floor((max(all_marker_distances) - min(all_marker_distances)) / binwidth))

# create and save the plot
plt.hist(all_marker_distances, bins=binnr)
plt.xlabel('Time of markers (s)')
plt.ylabel('Number of markers')
plt.show()

# export data to a CSV file
with open("P:/schlaefke/WRS_Test/PTU_Data.csv", "a", newline="") as f:
    writer = csv.writer(f)
    writer.writerow([variance, relvariance, outlier, reloutlier])

现在为 Seaborn 创建一个 DataFrame

data_histogram = pd.DataFrame({'Time of markers (s)': all_marker_distances})

并使用 Seaborn 绘制直方图

sns.histplot(data_histogram, x="Time of markers (s)", bins=binnr)
plt.show()

确保根据您的需求调整 Seaborn 部分,因为 Seaborn 可能有您想要用于自定义的其他参数。



所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/2039588/f175ce17bf154666d4e7/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

28 0
收藏该文
已收藏

评论内容:(最多支持255个字符)