我想根据 Pandas 中的 groupedby 合并数据框中的几个字符串。
到目前为止,这是我的代码:
import pandas as pd from io import StringIO data = StringIO(""" "name1","hej","2014-11-01" "name1","du","2014-11-02" "name1","aj","2014-12-01" "name1","oj","2014-12-02" "name2","fin","2014-11-01" "name2","katt","2014-11-02" "name2","mycket","2014-12-01" "name2","lite","2014-12-01" """) # load string as stream into dataframe df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2]) # add column with month df["month"] = df["date"].apply(lambda x: x.month)
我希望最终结果如下所示:
我不明白如何使用 groupby 并在“文本”列中应用某种字符串连接。任何帮助表示赞赏!
您可以按'name'和'month'列分组,然后调用transform它将返回与原始 df 对齐的数据并在我们join的文本条目中应用 lambda:
'name'
'month'
transform
join
In [119]: df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x)) df[['name','text','month']].drop_duplicates() Out[119]: name text month 0 name1 hej,du 11 2 name1 aj,oj 12 4 name2 fin,katt 11 6 name2 mycket,lite 12
我通过在此处传递感兴趣的列列表来子原始df,df[['name','text','month']]然后调用drop_duplicates
df[['name','text','month']]
drop_duplicates
编辑 实际上我可以打电话apply然后reset_index:
apply
reset_index
In [124]: df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index() Out[124]: name month text 0 name1 11 hej,du 1 name1 12 aj,oj 2 name2 11 fin,katt 3 name2 12 mycket,lite
更新
这里lambda是不必要的:
lambda
In[38]: df.groupby(['name','month'])['text'].apply(','.join).reset_index() Out[38]: name month text 0 name1 11 du 1 name1 12 aj,oj 2 name2 11 fin,katt 3 name2 12 mycket,lite