发布于2019-08-22 19:52 阅读(281) 评论(0) 点赞(10) 收藏(0)
data1 = data.head(6) # 取前六行数据
data2 = data.tail(6) # 取后六行数据
print(data1)
print('----'*50)
print(data2)
Rank City State Population Date of census/estimate
0 1 London[2] United Kingdom 8,615,246 1-Jun-14
1 2 Berlin Germany 3,437,916 31-May-14
2 3 Madrid Spain 3,165,235 1-Jan-14
3 4 Rome Italy 2,872,086 30-Sep-14
4 5 Paris France 2,273,305 1-Jan-13
5 6 Bucharest Romania 1,883,425 20-Oct-11
---------------------------------------------------------------------------
Rank City State Population Date of census/estimate
99 100 Valladolid Spain 311,501 1-Jan-12
100 101 Bonn Germany 309,869 31-Dec-12
101 102 Malm枚 Sweden 309,105 31-Mar-13
102 103 Nottingham United Kingdom 308,735 30-Jun-12
103 104 Katowice Poland 308,269 30-Jun-12
104 105 Kaunas Lithuania 306,888 1-Jan-13
print(data.index)
print('--'*40)
print(data.columns)
print('--'*40)
print(data.values)
RangeIndex(start=0, stop=105, step=1)
--------------------------------------------------------------------------------
Index(['Rank', 'City', 'State', 'Population', 'Date of census/estimate'], dtype='object')
--------------------------------------------------------------------------------
[[1 'London[2]' ' United Kingdom' '8,615,246' '1-Jun-14']
[2 'Berlin' ' Germany' '3,437,916' '31-May-14']
[3 'Madrid' ' Spain' '3,165,235' '1-Jan-14']
[4 'Rome' ' Italy' '2,872,086' '30-Sep-14']
[5 'Paris' ' France' '2,273,305' '1-Jan-13']
...
[101 'Bonn' ' Germany' '309,869' '31-Dec-12']
[102 'Malm枚' ' Sweden' '309,105' '31-Mar-13']
[103 'Nottingham' ' United Kingdom' '308,735' '30-Jun-12']
[104 'Katowice' ' Poland' '308,269' '30-Jun-12']
[105 'Kaunas' ' Lithuania' '306,888' '1-Jan-13']]
print(data.describe())
Rank
count 105.000000
mean 53.057143
std 30.428298
min 1.000000
25% 27.000000
50% 53.000000
75% 79.000000
max 105.000000
print(data)
print('--'*40)
print(data.T)
Rank City State Population Date of census/estimate
0 1 London[2] United Kingdom 8,615,246 1-Jun-14
1 2 Berlin Germany 3,437,916 31-May-14
2 3 Madrid Spain 3,165,235 1-Jan-14
3 4 Rome Italy 2,872,086 30-Sep-14
4 5 Paris France 2,273,305 1-Jan-13
.. ... ... ... ... ...
100 101 Bonn Germany 309,869 31-Dec-12
101 102 Malm枚 Sweden 309,105 31-Mar-13
102 103 Nottingham United Kingdom 308,735 30-Jun-12
103 104 Katowice Poland 308,269 30-Jun-12
104 105 Kaunas Lithuania 306,888 1-Jan-13
[105 rows x 5 columns]
--------------------------------------------------------------------------------
0 1 ... 103 104
Rank 1 2 ... 104 105
City London[2] Berlin ... Katowice Kaunas
State United Kingdom Germany ... Poland Lithuania
Population 8,615,246 3,437,916 ... 308,269 306,888
Date of census/estimate 1-Jun-14 31-May-14 ... 30-Jun-12 1-Jan-13
[5 rows x 105 columns]
print(data.sort_index(axis=0, ascending=False)) # 当 axis=1 按照纵轴排序, ascending:升序
Rank City State Population Date of census/estimate
104 105 Kaunas Lithuania 306,888 1-Jan-13
103 104 Katowice Poland 308,269 30-Jun-12
102 103 Nottingham United Kingdom 308,735 30-Jun-12
101 102 Malm枚 Sweden 309,105 31-Mar-13
100 101 Bonn Germany 309,869 31-Dec-12
.. ... ... ... ... ...
4 5 Paris France 2,273,305 1-Jan-13
3 4 Rome Italy 2,872,086 30-Sep-14
2 3 Madrid Spain 3,165,235 1-Jan-14
1 2 Berlin Germany 3,437,916 31-May-14
0 1 London[2] United Kingdom 8,615,246 1-Jun-14
print(data.sort_values(['City']))
Rank City State Population Date of census/estimate
91 92 Aarhus Denmark 326,676 1-Oct-14
85 86 Alicante Spain 334,678 1-Jan-12
22 23 Amsterdam Netherlands 813,562 31-May-14
58 59 Antwerp Belgium 510,610 1-Jan-14
33 34 Athens Greece 664,046 24-May-11
.. ... ... ... ... ...
34 35 Wroc艂aw Poland 632,432 31-Mar-14
82 83 Wuppertal Germany 342,885 31-Dec-12
23 24 Zagreb Croatia 790,017 31-Mar-11
32 33 Zaragoza Spain 666,058 1-Jan-14
27 28 艁贸d藕 Poland 709,757 31-Mar-14
print(data['State'])
0 United Kingdom
1 Germany
2 Spain
3 Italy
4 France
...
100 Germany
101 Sweden
102 United Kingdom
103 Poland
104 Lithuania
Name: State, Length: 105, dtype: object
print(data[:5])
Rank City State Population Date of census/estimate
0 1 London[2] United Kingdom 8,615,246 1-Jun-14
1 2 Berlin Germany 3,437,916 31-May-14
2 3 Madrid Spain 3,165,235 1-Jan-14
3 4 Rome Italy 2,872,086 30-Sep-14
4 5 Paris France 2,273,305 1-Jan-13
print(data.loc[data.index[0]])
Rank 1
City London[2]
State United Kingdom
Population 8,615,246
Date of census/estimate 1-Jun-14
Name: 0, dtype: object
print(data.loc[:, ['State', 'Population']])
State Population
0 United Kingdom 8,615,246
1 Germany 3,437,916
2 Spain 3,165,235
3 Italy 2,872,086
4 France 2,273,305
.. ... ...
100 Germany 309,869
101 Sweden 309,105
102 United Kingdom 308,735
103 Poland 308,269
104 Lithuania 306,888
[105 rows x 2 columns]
print(data.loc[1: 4, ['State', 'Population']])
print(data.loc[1: 4, 'Rank':'Population'])
print(data.loc[[1, 3], 'City':'Population'])
# loc方法里,可以用切片的方法也可以用标签单独取值,这里说的切片和你想的不一样
# 其实二维数组就是张表了,有字段,有值就是一个表结构,由横纵两个轴构建。横:axis=0, 纵:axis=1
# 横纵轴用于定位元素的(因为在科学统计时我们往往需要批量的操作数据),批量操作数据就需要在宏观上
# 定义数据,定义的方式是把它们都放在列表里,通过下标来取值,而它们每个字段又是横纵方向的键,那值
# 当然是跟在屁股后面的整段数据。这或许就是二维数组的本质:将表格的每行每列按照 '键'='值' 它就是
# Series, Series交织起来的结构叫 DataFrame
# 属于个人理解(不喜勿喷)
type | index | Series | Series | Series |
---|---|---|---|---|
- | - | Rank | State | Population |
Series | 0 | 1 | A | A |
Series | 1 | 2 | B | B |
print(data.loc[1, ['State', 'Population']]) # 说的挺高级,就是定位数据,返回<class 'pandas.core.series.Series'>
State Germany
Population 3,437,916
Name: 1, dtype: object
print(data.loc[1, 'Population']) # 确实像一颗洋葱,如果你愿意一层一层的拨开我的心,你会发现,你会压抑,最深处的秘密。
3,437,916
<class 'str'>
print(data.at[1, 'Population']) # 与5是等价的
3,437,916
<class 'str'>
print(data.iloc[1])
Rank 2
City Berlin
State Germany
Population 3,437,916
Date of census/estimate 31-May-14
Name: 1, dtype: object
data.iloc[1:3, 0: 4]
Rank City State Population
1 2 Berlin Germany 3,437,916
2 3 Madrid Spain 3,165,235
data.iloc[[1, 3, 5], [0, 1, 2]]
Rank City State
1 2 Berlin Germany
3 4 Rome Italy
5 6 Bucharest Romania
print(data.iloc[1:3, :])
Rank City State Population Date of census/estimate
1 2 Berlin Germany 3,437,916 31-May-14
2 3 Madrid Spain 3,165,235 1-Jan-14
print(data.iloc[:, 0:3])
Rank City State
0 1 London[2] United Kingdom
1 2 Berlin Germany
2 3 Madrid Spain
3 4 Rome Italy
4 5 Paris France
.. ... ... ...
100 101 Bonn Germany
101 102 Malm枚 Sweden
102 103 Nottingham United Kingdom
103 104 Katowice Poland
104 105 Kaunas Lithuania
[105 rows x 3 columns]
print(data.iloc[1, 1])
print(data.at[1,1])
Berlin
Berlin
data.Population = data.Population.apply(lambda x: int(x.replace(',', '')))
# 相当于获取到Population下的所有数据然后利用匿名函数 修改 数据结构 然后重新赋值给
# Population 这个字段
print(data[data.Population > 1000000])
Rank City State Population Date of census/estimate
0 1 London[2] United Kingdom 8615246 1-Jun-14
1 2 Berlin Germany 3437916 31-May-14
2 3 Madrid Spain 3165235 1-Jan-14
3 4 Rome Italy 2872086 30-Sep-14
4 5 Paris France 2273305 1-Jan-13
5 6 Bucharest Romania 1883425 20-Oct-11
6 7 Vienna Austria 1794770 1-Jan-15
7 8 Hamburg[10] Germany 1746342 30-Dec-13
8 9 Budapest Hungary 1744665 1-Jan-14
9 10 Warsaw Poland 1729119 31-Mar-14
10 11 Barcelona Spain 1602386 1-Jan-14
11 12 Munich Germany 1407836 31-Dec-13
12 13 Milan Italy 1332516 30-Sep-14
13 14 Sofia Bulgaria 1291895 14-Dec-14
14 15 Prague Czech Republic 1246780 1-Jan-13
15 16 Brussels[17] Belgium 1175831 1-Jan-14
16 17 Birmingham United Kingdom 1092330 30-Jun-13
17 18 Cologne Germany 1034175 31-Dec-13
print(data[data > 0])
a = [x for x in range(len(data.index))]
a = pd.Series(a, index=data.index) # 这列数据的索引必须和原数据一致
data1 = data.copy()
data1['E'] = a
print(data1[data1['E'].isin(['2', '4'])])
Rank City State Population Date of census/estimate E
2 3 Madrid Spain 3165235 1-Jan-14 2
4 5 Paris France 2273305 1-Jan-13 4
#上篇已经插入了
data1.at[data.index[0], 'f'] = 1
print(data1)
Rank City State Population Date of census/estimate f
0 1 London[2] United Kingdom 8615246 1-Jun-14 1
1 2 Berlin Germany 3437916 31-May-14 1
2 3 Madrid Spain 3165235 1-Jan-14 2
3 4 Rome Italy 2872086 30-Sep-14 3
4 5 Paris France 2273305 1-Jan-13 4
.. ... ... ... ... ... ...
100 101 Bonn Germany 309869 31-Dec-12 100
101 102 Malm枚 Sweden 309105 31-Mar-13 101
102 103 Nottingham United Kingdom 308735 30-Jun-12 102
103 104 Katowice Poland 308269 30-Jun-12 103
104 105 Kaunas Lithuania 306888 1-Jan-13 104
[105 rows x 6 columns]
data1.iat[1,2] = 0
print(data1)
Rank City State Population Date of census/estimate f
0 1 London[2] United Kingdom 8615246 1-Jun-14 0
1 2 Berlin 0 3437916 31-May-14 1
2 3 Madrid Spain 3165235 1-Jan-14 2
3 4 Rome Italy 2872086 30-Sep-14 3
4 5 Paris France 2273305 1-Jan-13 4
.. ... ... ... ... ... ...
100 101 Bonn Germany 309869 31-Dec-12 100
101 102 Malm枚 Sweden 309105 31-Mar-13 101
102 103 Nottingham United Kingdom 308735 30-Jun-12 102
103 104 Katowice Poland 308269 30-Jun-12 103
104 105 Kaunas Lithuania 306888 1-Jan-13 104
[105 rows x 6 columns]
data1.loc[:, 'D'] = np.array([5] * len(data1))
print(data1)
Rank City State ... Date of census/estimate f D
0 1 London[2] United Kingdom ... 1-Jun-14 0 5
1 2 Berlin Germany ... 31-May-14 1 5
2 3 Madrid Spain ... 1-Jan-14 2 5
3 4 Rome Italy ... 30-Sep-14 3 5
4 5 Paris France ... 1-Jan-13 4 5
.. ... ... ... ... ... ... ..
100 101 Bonn Germany ... 31-Dec-12 100 5
101 102 Malm枚 Sweden ... 31-Mar-13 101 5
102 103 Nottingham United Kingdom ... 30-Jun-12 102 5
103 104 Katowice Poland ... 30-Jun-12 103 5
104 105 Kaunas Lithuania ... 1-Jan-13 104 5
[105 rows x 7 columns]
data2.f[data2.f > 0] = -data2.f
print(data2)
Rank City State Population Date of census/estimate f
0 1 London[2] United Kingdom 8615246 1-Jun-14 0
1 2 Berlin Germany 3437916 31-May-14 -1
2 3 Madrid Spain 3165235 1-Jan-14 -2
3 4 Rome Italy 2872086 30-Sep-14 -3
4 5 Paris France 2273305 1-Jan-13 -4
.. ... ... ... ... ... ...
100 101 Bonn Germany 309869 31-Dec-12 -100
101 102 Malm枚 Sweden 309105 31-Mar-13 -101
102 103 Nottingham United Kingdom 308735 30-Jun-12 -102
103 104 Katowice Poland 308269 30-Jun-12 -103
104 105 Kaunas Lithuania 306888 1-Jan-13 -104
[105 rows x 6 columns]
在pandas中,使用np.nan来替代空值,这些值将默认不包含在计算中。
data3 = data2.reindex(index=data2.index, columns=list(data2.columns) + ['E'])
data3.loc[0:2, 'E'] = 1
print(data3)
Rank City State ... Date of census/estimate f E
0 1 London[2] United Kingdom ... 1-Jun-14 0 1.0
1 2 Berlin Germany ... 31-May-14 1 1.0
2 3 Madrid Spain ... 1-Jan-14 2 1.0
3 4 Rome Italy ... 30-Sep-14 3 NaN
4 5 Paris France ... 1-Jan-13 4 NaN
.. ... ... ... ... ... ... ...
100 101 Bonn Germany ... 31-Dec-12 100 NaN
101 102 Malm枚 Sweden ... 31-Mar-13 101 NaN
102 103 Nottingham United Kingdom ... 30-Jun-12 102 NaN
103 104 Katowice Poland ... 30-Jun-12 103 NaN
104 105 Kaunas Lithuania ... 1-Jan-13 104 NaN
[105 rows x 7 columns]
```
data4 = data3.dropna()
print(data4)
Rank City State Population Date of census/estimate f E
0 1 London[2] United Kingdom 8615246 1-Jun-14 0 0.0
1 2 Berlin Germany 3437916 31-May-14 1 1.0
2 3 Madrid Spain 3165235 1-Jan-14 2 2.0
data2.loc[0, 'f'] = None
data3 = data2.fillna(value=5)
data3.f = data3.f.apply(lambda x: int(x))
print(data3)
Rank City State Population Date of census/estimate f
0 1 London[2] United Kingdom 8615246 1-Jun-14 5
1 2 Berlin Germany 3437916 31-May-14 1
2 3 Madrid Spain 3165235 1-Jan-14 2
3 4 Rome Italy 2872086 30-Sep-14 3
4 5 Paris France 2273305 1-Jan-13 4
.. ... ... ... ... ... ...
100 101 Bonn Germany 309869 31-Dec-12 100
101 102 Malm枚 Sweden 309105 31-Mar-13 101
102 103 Nottingham United Kingdom 308735 30-Jun-12 102
103 104 Katowice Poland 308269 30-Jun-12 103
104 105 Kaunas Lithuania 306888 1-Jan-13 104
[105 rows x 6 columns]
data2.loc[0, 'f'] = None
data3 = pd.isnull(data2)
print(data3)
Rank City State Population Date of census/estimate f
0 False False False False False True
1 False False False False False False
2 False False False False False False
3 False False False False False False
4 False False False False False False
.. ... ... ... ... ... ...
100 False False False False False False
101 False False False False False False
102 False False False False False False
103 False False False False False False
104 False False False False False False
[105 rows x 6 columns]
print(round(data2.mean(), 2))
Rank 53.06
Population 787679.09
f 52.50
dtype: float64
print(round(data2.mean(1), 2)) # 就是按照纵轴来取平均值,很少用
0 4307623.50
1 1145973.00
2 1055080.00
3 957364.33
4 757771.33
...
100 103356.67
101 103102.67
102 102980.00
103 102825.33
104 102365.67
Length: 105, dtype: float64
data.Population = data.Population.apply(lambda x: int(x.replace(',', '')))
s = pd.Series(np.random.randint(0, 7, size=10))
s.value_counts()
0 0
1 2
2 1
3 2
4 1
5 1
6 3
7 4
8 1
9 5
dtype: int32
1 4
2 2
5 1
4 1
3 1
0 1
dtype: int64
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
s.str.lower()
0 a
1 b
2 c
3 aaba
4 baca
5 NaN
6 caba
7 dog
8 cat
dtype: object
作者:ryuer8423
链接:https://www.pythonheidong.com/blog/article/53545/cd6305f65e6ca4efef49/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!