+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

2019-08(47)

2019-09(90)

2019-10(13)

2019-11(9)

2019-12(19)

pandas快速入门3(数据整形、透视、时间序列、可视化、数据载入与保存)

发布于2020-08-11 17:21     阅读(433)     评论(0)     点赞(15)     收藏(1)


数据整形、透视、时间序列、可视化、数据载入与保存

>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> tuples=list(zip(*[['bar','bar','baz','baz','foo','foo','qux','qux'],
              ['one','two','one','two','one','two','one','two']]))
>>> tuples
[('bar', 'one'), ('bar', 'two'), ('baz', 'one'), ('baz', 'two'), ('foo', 'one'), ('foo', 'two'), ('qux', 'one'), ('qux', 'two')]


>>> index=pd.MultiIndex.from_tuples(tuples,names=['first','second'])
>>> index
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
           codes=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first', 'second'])


>>> df=pd.DataFrame(np.random.randn(8,2),index=index,columns=list('AB'))
>>> df
                     A         B
first second                    
bar   one    -0.082379  1.779122
      two    -2.104446  2.653455
baz   one    -1.176593  1.422018
      two    -2.566068  0.869758
foo   one    -1.313857 -0.685978
      two    -0.424554 -0.286876
qux   one     0.828323 -1.311333
      two    -0.358734  0.225068

>>> atacked=df.stack()
>>> atacked
first  second   
bar    one     A   -0.082379
               B    1.779122
       two     A   -2.104446
               B    2.653455
baz    one     A   -1.176593
               B    1.422018
       two     A   -2.566068
               B    0.869758
foo    one     A   -1.313857
               B   -0.685978
       two     A   -0.424554
               B   -0.286876
qux    one     A    0.828323
               B   -1.311333
       two     A   -0.358734
               B    0.225068
dtype: float64

>>> atacked.index
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two'], ['A', 'B']],
           codes=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first', 'second', None])
>>> atacked.unstack()
                     A         B
first second                    
bar   one    -0.082379  1.779122
      two    -2.104446  2.653455
baz   one    -1.176593  1.422018
      two    -2.566068  0.869758
foo   one    -1.313857 -0.685978
      two    -0.424554 -0.286876
qux   one     0.828323 -1.311333
      two    -0.358734  0.225068

>>> df=pd.DataFrame({'A':['one','one','two','three']*3,
                'B':['A','B','C']*4,
                'C':np.random.randn(12),
                'D':np.random.randn(12)})
>>> df
        A  B         C         D
0     one  A  0.889017  0.371978
1     one  B -1.171235 -0.255796
2     two  C -1.659094  1.217755
3   three  A -0.803847 -0.200328
4     one  B -1.877187  0.704438
5     one  C -0.740347 -0.847248
6     two  A  1.323123 -0.636503
7   three  B  0.596142  0.603215
8     one  C  1.270051 -1.599640
9     one  A  1.377637  0.170785
10    two  B -0.373041 -0.697124
11  three  C -1.901653  2.072091
>>> df.pivot_table(values=['D'],index=['A','B'],columns=['C'])
                D                                          ...                                                  
C       -1.901653 -1.877187 -1.659094 -1.171235 -0.803847  ...  0.596142  0.889017  1.270051  1.323123  1.377637
A     B                                                    ...                                                  
one   A       NaN       NaN       NaN       NaN       NaN  ...       NaN  0.371978       NaN       NaN  0.170785
      B       NaN  0.704438       NaN -0.255796       NaN  ...       NaN       NaN       NaN       NaN       NaN
      C       NaN       NaN       NaN       NaN       NaN  ...       NaN       NaN  -1.59964       NaN       NaN
three A       NaN       NaN       NaN       NaN -0.200328  ...       NaN       NaN       NaN       NaN       NaN
      B       NaN       NaN       NaN       NaN       NaN  ...  0.603215       NaN       NaN       NaN       NaN
      C  2.072091       NaN       NaN       NaN       NaN  ...       NaN       NaN       NaN       NaN       NaN
two   A       NaN       NaN       NaN       NaN       NaN  ...       NaN       NaN       NaN -0.636503       NaN
      B       NaN       NaN       NaN       NaN       NaN  ...       NaN       NaN       NaN       NaN       NaN
      C       NaN       NaN  1.217755       NaN       NaN  ...       NaN       NaN       NaN       NaN       NaN

[9 rows x 12 columns]
>>> df=pd.DataFrame({'A':['one','one','two','three']*3,
                'B':['A','B','C']*4,
                'C':['foo','foo','foo','bar','bar','bar']*2,
                'D':np.random.randn(12),
                'E':np.random.randn(12)})
df.pivot_table(values=['D'],index=['A','B'],columns=['C'])  #数据透视
                D          
C             bar       foo
A     B                    
one   A -2.282182 -0.270135
      B  1.254654 -0.279277
      C  1.526645 -0.419079
three A -0.966811       NaN
      B       NaN -1.899693
      C -0.842920       NaN
two   A       NaN -1.469618
      B -0.225424       NaN
      C       NaN  0.554990
>>> df.pivot_table(values=['E'],index=['A'],columns=['C'])
              E          
C           bar       foo
A                        
one    0.168522  0.258927
three -0.566985  0.091718
two   -1.633643  0.401005
>>> df[df.A=='one'].groupby('C').mean()
            D         E
C                      
bar  0.166372  0.168522
foo -0.322830  0.258927
>>> rng=pd.date_range('20160301',periods=600,freq='s')
>>> s=pd.Series(np.random.randint(0,500,len(rng)),index=rng)
>>> s
2016-03-01 00:00:00    169
2016-03-01 00:00:01     96
2016-03-01 00:00:02    321
2016-03-01 00:00:03    308
2016-03-01 00:00:04    103
2016-03-01 00:00:05    190
2016-03-01 00:00:06    164
2016-03-01 00:00:07    243
2016-03-01 00:00:08    162
2016-03-01 00:00:09    164
2016-03-01 00:00:10     41
2016-03-01 00:00:11    274
2016-03-01 00:00:12    441
2016-03-01 00:00:13    345
2016-03-01 00:00:14    200
2016-03-01 00:00:15    217
2016-03-01 00:00:16     76
2016-03-01 00:00:17    214
2016-03-01 00:00:18    145
2016-03-01 00:00:19    440
2016-03-01 00:00:20     37
2016-03-01 00:00:21    244
2016-03-01 00:00:22    433
2016-03-01 00:00:23    460
2016-03-01 00:00:24     96
2016-03-01 00:00:25    492
2016-03-01 00:00:26    310
2016-03-01 00:00:27    227
2016-03-01 00:00:28     93
2016-03-01 00:00:29    364
                      ... 
2016-03-01 00:09:30    480
2016-03-01 00:09:31    410
2016-03-01 00:09:32    130
2016-03-01 00:09:33    322
2016-03-01 00:09:34    490
2016-03-01 00:09:35    313
2016-03-01 00:09:36    470
2016-03-01 00:09:37    368
2016-03-01 00:09:38    167
2016-03-01 00:09:39    126
2016-03-01 00:09:40     74
2016-03-01 00:09:41     46
2016-03-01 00:09:42    327
2016-03-01 00:09:43    102
2016-03-01 00:09:44    471
2016-03-01 00:09:45    383
2016-03-01 00:09:46    495
2016-03-01 00:09:47    226
2016-03-01 00:09:48    404
2016-03-01 00:09:49    142
2016-03-01 00:09:50     24
2016-03-01 00:09:51    261
2016-03-01 00:09:52     82
2016-03-01 00:09:53    126
2016-03-01 00:09:54    179
2016-03-01 00:09:55    459
2016-03-01 00:09:56    412
2016-03-01 00:09:57     50
2016-03-01 00:09:58     34
2016-03-01 00:09:59    249
Freq: S, Length: 600, dtype: int32
>>> s.resample('2Min',how='sum')

Warning (from warnings module):
  File "__main__", line 1
FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).sum()
2016-03-01 00:00:00    28596
2016-03-01 00:02:00    28047
2016-03-01 00:04:00    28618
2016-03-01 00:06:00    26790
2016-03-01 00:08:00    31616
Freq: 2T, dtype: int32
>>> s.resample('2Min',how='mean')

Warning (from warnings module):
  File "__main__", line 1
FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).mean()
2016-03-01 00:00:00    238.300000
2016-03-01 00:02:00    233.725000
2016-03-01 00:04:00    238.483333
2016-03-01 00:06:00    223.250000
2016-03-01 00:08:00    263.466667
Freq: 2T, dtype: float64
>>> rng=pd.period_range('2000Q1','2016Q1',freq='Q')
>>> rng
PeriodIndex(['2000Q1', '2000Q2', '2000Q3', '2000Q4', '2001Q1', '2001Q2',
             '2001Q3', '2001Q4', '2002Q1', '2002Q2', '2002Q3', '2002Q4',
             '2003Q1', '2003Q2', '2003Q3', '2003Q4', '2004Q1', '2004Q2',
             '2004Q3', '2004Q4', '2005Q1', '2005Q2', '2005Q3', '2005Q4',
             '2006Q1', '2006Q2', '2006Q3', '2006Q4', '2007Q1', '2007Q2',
             '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3', '2008Q4',
             '2009Q1', '2009Q2', '2009Q3', '2009Q4', '2010Q1', '2010Q2',
             '2010Q3', '2010Q4', '2011Q1', '2011Q2', '2011Q3', '2011Q4',
             '2012Q1', '2012Q2', '2012Q3', '2012Q4', '2013Q1', '2013Q2',
             '2013Q3', '2013Q4', '2014Q1', '2014Q2', '2014Q3', '2014Q4',
             '2015Q1', '2015Q2', '2015Q3', '2015Q4', '2016Q1'],
            dtype='period[Q-DEC]', freq='Q-DEC')
>>> rng.to_timestamp()
DatetimeIndex(['2000-01-01', '2000-04-01', '2000-07-01', '2000-10-01',
               '2001-01-01', '2001-04-01', '2001-07-01', '2001-10-01',
               '2002-01-01', '2002-04-01', '2002-07-01', '2002-10-01',
               '2003-01-01', '2003-04-01', '2003-07-01', '2003-10-01',
               '2004-01-01', '2004-04-01', '2004-07-01', '2004-10-01',
               '2005-01-01', '2005-04-01', '2005-07-01', '2005-10-01',
               '2006-01-01', '2006-04-01', '2006-07-01', '2006-10-01',
               '2007-01-01', '2007-04-01', '2007-07-01', '2007-10-01',
               '2008-01-01', '2008-04-01', '2008-07-01', '2008-10-01',
               '2009-01-01', '2009-04-01', '2009-07-01', '2009-10-01',
               '2010-01-01', '2010-04-01', '2010-07-01', '2010-10-01',
               '2011-01-01', '2011-04-01', '2011-07-01', '2011-10-01',
               '2012-01-01', '2012-04-01', '2012-07-01', '2012-10-01',
               '2013-01-01', '2013-04-01', '2013-07-01', '2013-10-01',
               '2014-01-01', '2014-04-01', '2014-07-01', '2014-10-01',
               '2015-01-01', '2015-04-01', '2015-07-01', '2015-10-01',
               '2016-01-01'],
              dtype='datetime64[ns]', freq='QS-OCT')
>>> pd.Timestamp('20160301')-pd.Timestamp('20160201')
Timedelta('29 days 00:00:00')
>>> pd.Timestamp('20160301')+pd.Timedelta(days=5)
Timestamp('2016-03-06 00:00:00')
>>> df=pd.DataFrame({'id':[1,2,3,4,5,6],'raw_grade':['a','b','b','a','a','d']})
>>> df
   id raw_grade
0   1         a
1   2         b
2   3         b
3   4         a
4   5         a
5   6         d
>>> df['grade']=df.raw_grade.astype('category')
>>> df
   id raw_grade grade
0   1         a     a
1   2         b     b
2   3         b     b
3   4         a     a
4   5         a     a
5   6         d     d
>>> df.grade
0    a
1    b
2    b
3    a
4    a
5    d
Name: grade, dtype: category
Categories (3, object): [a, b, d]
>>> df.grade.cat.categories
Index(['a', 'b', 'd'], dtype='object')
>>> df.grade.cat.categories=['very good','good','bad']
>>> df
   id raw_grade      grade
0   1         a  very good
1   2         b       good
2   3         b       good
3   4         a  very good
4   5         a  very good
5   6         d        bad
>>> df.sort_values(by='grade',ascending=False)
   id raw_grade      grade
5   6         d        bad
2   3         b       good
1   2         b       good
4   5         a  very good
3   4         a  very good
0   1         a  very good
>>> s=pd.Series(np.random.randn(1000),index=pd.date_range('20000101',periods=1000))
>>> s

>>> s=s.cumsum()
>>> s.plot()
<matplotlib.axes._subplots.AxesSubplot object at 0x000002536CDFE320>
>>> df=pd.DataFrame(np.random.randn(100,4),columns=list('ABCD'))
>>> df.to_csv('data.csv')
>>> pd.read_csv('data.csv')
    Unnamed: 0         A         B         C         D
0            0  0.295907 -0.157520  0.350067 -1.536092
1            1 -0.541243  0.416142  0.889910 -1.639846
2            2 -0.024508  0.356052  0.419268  1.299046
3            3 -0.527460 -0.153432  0.833790 -0.958435
4            4  0.347399  0.758116 -1.414419  0.675910
5            5 -0.260096  0.745978 -1.243023 -0.650448
6            6 -1.213138  0.143236 -0.871795 -0.139552
7            7 -1.287272  0.860484  1.143898  1.452429
8            8 -0.740297 -1.344640 -1.610561 -0.867919
9            9 -0.087605 -0.106898 -0.978996  1.526133
10          10  0.010966 -0.340403  1.285855  0.269217
11          11 -0.683929 -0.018515  1.297817 -1.196779
12          12 -1.260310  0.494402  0.623227  0.131635
13          13 -0.112332  0.119360  0.353475 -1.203791
14          14 -0.660513 -0.359216  0.629409  2.176046
15          15 -0.106322  0.911818  0.188707 -1.835326
16          16 -0.200058  1.121014  0.381226  1.101125
17          17  1.531576 -0.955301  1.168179  0.086679
18          18  1.134901 -0.063212  0.706886 -0.502317
19          19 -0.048796 -1.340618 -0.137974  0.842102
20          20 -0.741845  0.906082  0.340839  0.170244
21          21  0.326208 -0.123587 -1.640063 -0.065321
22          22 -1.178746 -0.429873 -0.598956  1.666959
23          23  1.007721 -0.822012 -0.195927 -1.283707
24          24 -1.349399 -1.717132  0.132711 -0.382790
25          25 -0.258322 -0.437100 -0.035122 -1.221744
26          26 -0.867473  0.940612 -2.273239 -1.499303
27          27  0.075474 -1.355473 -1.202164 -1.586610
28          28 -0.378404 -1.302919  0.360150 -1.216197
29          29  1.420782  0.257367 -0.065424 -1.822010
..         ...       ...       ...       ...       ...
70          70  0.957772  0.163276 -0.934935  1.292447
71          71 -0.722990  0.725030  0.149166  1.612356
72          72  0.142796 -1.611163 -1.464041  1.204536
73          73 -0.047842 -1.587596  0.232701 -2.399041
74          74 -0.750892 -1.014131 -0.934193 -0.138739
75          75 -1.883991  0.276754 -1.169615 -1.593522
76          76 -0.804246  0.363827 -2.498148 -0.420052
77          77 -1.220221 -0.278808 -0.023398  0.850487
78          78  2.318605 -0.847733 -0.630146  1.162671
79          79  0.090052 -0.063712 -0.576621 -0.012520
80          80 -0.088946  1.503521 -0.656729 -0.084417
81          81  1.531391  1.588890 -0.386054  2.078755
82          82 -0.879237  1.217273 -0.167847  0.537144
83          83 -0.557477 -1.338260  0.174868 -0.460700
84          84 -2.624321 -0.713480  0.386414  2.176986
85          85 -0.103021 -1.921546 -0.059659  1.281954
86          86  0.014018  0.832007 -1.543069 -0.771887
87          87  0.000057  2.235368  0.302721  0.690989
88          88  0.325149  0.328327 -0.397771 -0.620698
89          89 -0.418378 -1.629091  1.292397  2.447032
90          90 -2.248013  0.155704  1.162686  0.855392
91          91 -1.020387  0.696714 -0.922030  1.082949
92          92 -1.385169  0.314946 -0.190743 -0.423856
93          93  0.603514  0.444533 -1.994167 -0.846152
94          94  1.331633 -0.514216 -0.838180  0.908482
95          95 -0.337609 -0.637598  0.371794  0.729177
96          96  0.793032 -0.225214  0.011796 -0.362990
97          97 -0.625474 -0.517675 -1.091153 -0.497015
98          98  0.603308 -0.455933  1.099731 -0.775502
99          99  1.353817  0.127764  0.787311  1.796455

[100 rows x 5 columns]
>>> pd.read_csv('data.csv',index_col=0)  #读某一列
           A         B         C         D
0   0.295907 -0.157520  0.350067 -1.536092
1  -0.541243  0.416142  0.889910 -1.639846
2  -0.024508  0.356052  0.419268  1.299046
3  -0.527460 -0.153432  0.833790 -0.958435
4   0.347399  0.758116 -1.414419  0.675910
5  -0.260096  0.745978 -1.243023 -0.650448
6  -1.213138  0.143236 -0.871795 -0.139552
7  -1.287272  0.860484  1.143898  1.452429
8  -0.740297 -1.344640 -1.610561 -0.867919
9  -0.087605 -0.106898 -0.978996  1.526133
10  0.010966 -0.340403  1.285855  0.269217
11 -0.683929 -0.018515  1.297817 -1.196779
12 -1.260310  0.494402  0.623227  0.131635
13 -0.112332  0.119360  0.353475 -1.203791
14 -0.660513 -0.359216  0.629409  2.176046
15 -0.106322  0.911818  0.188707 -1.835326
16 -0.200058  1.121014  0.381226  1.101125
17  1.531576 -0.955301  1.168179  0.086679
18  1.134901 -0.063212  0.706886 -0.502317
19 -0.048796 -1.340618 -0.137974  0.842102
20 -0.741845  0.906082  0.340839  0.170244
21  0.326208 -0.123587 -1.640063 -0.065321
22 -1.178746 -0.429873 -0.598956  1.666959
23  1.007721 -0.822012 -0.195927 -1.283707
24 -1.349399 -1.717132  0.132711 -0.382790
25 -0.258322 -0.437100 -0.035122 -1.221744
26 -0.867473  0.940612 -2.273239 -1.499303
27  0.075474 -1.355473 -1.202164 -1.586610
28 -0.378404 -1.302919  0.360150 -1.216197
29  1.420782  0.257367 -0.065424 -1.822010
..       ...       ...       ...       ...
70  0.957772  0.163276 -0.934935  1.292447
71 -0.722990  0.725030  0.149166  1.612356
72  0.142796 -1.611163 -1.464041  1.204536
73 -0.047842 -1.587596  0.232701 -2.399041
74 -0.750892 -1.014131 -0.934193 -0.138739
75 -1.883991  0.276754 -1.169615 -1.593522
76 -0.804246  0.363827 -2.498148 -0.420052
77 -1.220221 -0.278808 -0.023398  0.850487
78  2.318605 -0.847733 -0.630146  1.162671
79  0.090052 -0.063712 -0.576621 -0.012520
80 -0.088946  1.503521 -0.656729 -0.084417
81  1.531391  1.588890 -0.386054  2.078755
82 -0.879237  1.217273 -0.167847  0.537144
83 -0.557477 -1.338260  0.174868 -0.460700
84 -2.624321 -0.713480  0.386414  2.176986
85 -0.103021 -1.921546 -0.059659  1.281954
86  0.014018  0.832007 -1.543069 -0.771887
87  0.000057  2.235368  0.302721  0.690989
88  0.325149  0.328327 -0.397771 -0.620698
89 -0.418378 -1.629091  1.292397  2.447032
90 -2.248013  0.155704  1.162686  0.855392
91 -1.020387  0.696714 -0.922030  1.082949
92 -1.385169  0.314946 -0.190743 -0.423856
93  0.603514  0.444533 -1.994167 -0.846152
94  1.331633 -0.514216 -0.838180  0.908482
95 -0.337609 -0.637598  0.371794  0.729177
96  0.793032 -0.225214  0.011796 -0.362990
97 -0.625474 -0.517675 -1.091153 -0.497015
98  0.603308 -0.455933  1.099731 -0.775502
99  1.353817  0.127764  0.787311  1.796455

[100 rows x 4 columns]


所属网站分类: 技术文章 > 博客

作者:dfh8374

链接: https://www.pythonheidong.com/blog/article/479335/

来源: python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

15 0
收藏该文
已收藏

评论内容:(最多支持255个字符)