我要确保与val2对应的第一个值vintage是NaN。目前已经有两个了NaN,但是我想确保0.53也更改为NaN。
val2
vintage
NaN
0.53
df = pd.DataFrame({ 'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'], 'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'], 'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6], 'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan] })
到目前为止,这是我尝试过的方法:
df.groupby('vintage').first().val2 #This gives the first non-NaN values, as shown below vintage 2017-01-01 0.66 2017-02-01 0.53 2017-03-01 NaN df.groupby('vintage').first().val2 = np.nan #This doesn't change anything df.val2 0 NaN 1 0.66 2 0.81 3 0.53 4 0.62 5 NaN
您不能将结果赋值给聚合,也将first忽略现存的NaN,您可以做的是调用head(1),它将返回每个组的第一行,并将索引传递loc给orig df以覆盖这些列值:
first
head(1)
loc
In[91] df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.NaN df: Out[91]: date val1 val2 vintage 0 2017-01-01 0.59 NaN 2017-01-01 1 2017-02-01 0.68 0.66 2017-01-01 2 2017-03-01 0.80 0.81 2017-01-01 3 2017-02-01 0.54 NaN 2017-02-01 4 2017-03-01 0.61 0.62 2017-02-01 5 2017-03-01 0.60 NaN 2017-03-01
在这里,您可以看到head(1)返回每个组的第一行:
In[94]: df.groupby('vintage')['val2'].head(1) Out[94]: 0 NaN 3 0.53 5 NaN Name: val2, dtype: float64
与此相反的first结果将返回第一个非NaN,除非NaN该组只有值:
In[95]: df.groupby('vintage')['val2'].first() Out[95]: vintage 2017-01-01 0.66 2017-02-01 0.53 2017-03-01 NaN Name: val2, dtype: float64