从上一个问题的数据开始:
f = pd.DataFrame({'id':['a','b', 'a'], 'val':[['val1','val2'], ['val33','val9','val6'], ['val2','val6','val7']]}) print (df) id val 0 a [val1, val2] 1 b [val33, val9, val6] 2 a [val2, val6, val7]
如何将列表放入Dict:
pd.Series([a for b in df.val.tolist() for a in b]).value_counts().to_dict() {'val1': 1, 'val2': 2, 'val33': 1, 'val6': 2, 'val7': 1, 'val9': 1}
如何按组获取列表:
df.groupby('id')["val"].apply(lambda x: (list([a for b in x.tolist() for a in b])) ) id a [val1, val2, val2, val6, val7] b [val33, val9, val6] Name: val, dtype: object
我如何按组获取字典列表:
df.groupby('id')["val"].apply(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )
返回值:
id a val1 1.0 val2 2.0 val6 1.0 val7 1.0 b val33 1.0 val6 1.0 val9 1.0 Name: val, dtype: float64
期望的输出我忽略了什么?:
id a {'val1': 1, 'val2': 2, 'val6': 2, 'val7': 1} b {'val33': 1, 'val6': 1, 'val9': 1} Name: val, dtype: object
使用agg@ayhan进行编辑(比应用要快得多)。
from collections import Counter df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b]))
出:
id a {'val2': 2, 'val6': 1, 'val7': 1, 'val1': 1} b {'val9': 1, 'val33': 1, 'val6': 1} Name: val, dtype: object
此版本的时间:
%timeit df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b])) 1000 loops, best of 3: 820 µs per loop
@ayhan版本的时间:
%timeit df.groupby('id')["val"].agg(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() ) 100 loops, best of 3: 1.91 ms per loo