Pandas系列 – 怎样新增数据列？

在进行数据分析时，经常需要按照一定条件创建新的数据列，然后进行进一步分析。

直接赋值
df.apply方法
df.assign方法
按条件选择分组分别赋值
微信公众号：蚂蚁学Python

import pandas as pd

0、读取csv数据到dataframe

fpath = "./datas/beijing_tianqi/beijing_tianqi_2018.csv"
df = pd.read_csv(fpath)

df.head()

	ymd	bWendu	yWendu	tianqi	fengxiang	fengli	aqi	aqiInfo	aqiLevel
0	2018-01-01	3℃	-6℃	晴~多云	东北风	1-2级	59	良	2
1	2018-01-02	2℃	-5℃	阴~多云	东北风	1-2级	49	优	1
2	2018-01-03	2℃	-5℃	多云	北风	1-2级	28	优	1
3	2018-01-04	0℃	-8℃	阴	东北风	1-2级	28	优	1
4	2018-01-05	3℃	-6℃	多云~晴	西北风	1-2级	50	优	1

1、直接赋值的方法

实例：清理温度列，变成数字类型

# 替换掉温度的后缀℃
df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃", "").astype('int32')
df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃", "").astype('int32')

df.head()

	ymd	bWendu	yWendu	tianqi	fengxiang	fengli	aqi	aqiInfo	aqiLevel
0	2018-01-01	3	-6	晴~多云	东北风	1-2级	59	良	2
1	2018-01-02	2	-5	阴~多云	东北风	1-2级	49	优	1
2	2018-01-03	2	-5	多云	北风	1-2级	28	优	1
3	2018-01-04	0	-8	阴	东北风	1-2级	28	优	1
4	2018-01-05	3	-6	多云~晴	西北风	1-2级	50	优	1

实例：计算温差

# 注意，df["bWendu"]其实是一个Series，后面的减法返回的是Series
df.loc[:, "wencha"] = df["bWendu"] - df["yWendu"]

df.head()

	ymd	bWendu	yWendu	tianqi	fengxiang	fengli	aqi	aqiInfo	aqiLevel	wencha
0	2018-01-01	3	-6	晴~多云	东北风	1-2级	59	良	2	9
1	2018-01-02	2	-5	阴~多云	东北风	1-2级	49	优	1	7
2	2018-01-03	2	-5	多云	北风	1-2级	28	优	1	7
3	2018-01-04	0	-8	阴	东北风	1-2级	28	优	1	8
4	2018-01-05	3	-6	多云~晴	西北风	1-2级	50	优	1	9

2、df.apply方法

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1).

实例：添加一列温度类型：

如果最高温度大于33度就是高温
低于-10度是低温
否则是常温

def get_wendu_type(x):
    if x["bWendu"] > 33:
        return '高温'
    if x["yWendu"] < -10:
        return '低温'
    return '常温'

# 注意需要设置axis==1，这是series的index是columns
df.loc[:, "wendu_type"] = df.apply(get_wendu_type, axis=1)

# 查看温度类型的计数
df["wendu_type"].value_counts()

常温    328
高温     29
低温      8
Name: wendu_type, dtype: int64

3、df.assign方法

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones.

实例：将温度从摄氏度变成华氏度

# 可以同时添加多个新的列
df.assign(
    yWendu_huashi = lambda x : x["yWendu"] * 9 / 5 + 32,
    # 摄氏度转华氏度
    bWendu_huashi = lambda x : x["bWendu"] * 9 / 5 + 32
)

	ymd	bWendu	yWendu	tianqi	fengxiang	fengli	aqi	aqiInfo	aqiLevel	wencha	wendu_type	yWendu_huashi	bWendu_huashi
0	2018-01-01	3	-6	晴~多云	东北风	1-2级	59	良	2	9	常温	21.2	37.4
1	2018-01-02	2	-5	阴~多云	东北风	1-2级	49	优	1	7	常温	23.0	35.6
2	2018-01-03	2	-5	多云	北风	1-2级	28	优	1	7	常温	23.0	35.6
3	2018-01-04	0	-8	阴	东北风	1-2级	28	优	1	8	常温	17.6	32.0
4	2018-01-05	3	-6	多云~晴	西北风	1-2级	50	优	1	9	常温	21.2	37.4
...	...	...	...	...	...	...	...	...	...	...	...	...	...
360	2018-12-27	-5	-12	多云~晴	西北风	3级	48	优	1	7	低温	10.4	23.0
361	2018-12-28	-3	-11	晴	西北风	3级	40	优	1	8	低温	12.2	26.6
362	2018-12-29	-3	-12	晴	西北风	2级	29	优	1	9	低温	10.4	26.6
363	2018-12-30	-2	-11	晴~多云	东北风	1级	31	优	1	9	低温	12.2	28.4
364	2018-12-31	-2	-10	多云	东北风	1级	56	良	2	8	常温	14.0	28.4

365 rows × 13 columns

4、按条件选择分组分别赋值

按条件先选择数据，然后对这部分数据赋值新列
实例：高低温差大于10度，则认为温差大

# 先创建空列（这是第一种创建新列的方法）
df['wencha_type'] = ''

df.loc[df["bWendu"]-df["yWendu"]>10, "wencha_type"] = "温差大"

df.loc[df["bWendu"]-df["yWendu"]<=10, "wencha_type"] = "温差正常"

df["wencha_type"].value_counts()

温差正常    187
温差大     178
Name: wencha_type, dtype: int64