我在IPython中具有以下数据框,其中每一行都是一只股票:
In [261]: bdata Out[261]: <class 'pandas.core.frame.DataFrame'> Int64Index: 21210 entries, 0 to 21209 Data columns: BloombergTicker 21206 non-null values Company 21210 non-null values Country 21210 non-null values MarketCap 21210 non-null values PriceReturn 21210 non-null values SEDOL 21210 non-null values yearmonth 21210 non-null values dtypes: float64(2), int64(1), object(4)
我想应用一个groupby操作,该操作计算“ yearmonth”列中每个日期的所有内容的上限加权平均回报。
这按预期工作:
In [262]: bdata.groupby("yearmonth").apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) Out[262]: yearmonth 201204 -0.109444 201205 -0.290546
但是,然后我想将这些值“广播”回原始数据帧中的索引,并将它们保存为日期匹配的常量列。
In [263]: dateGrps = bdata.groupby("yearmonth") In [264]: dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/<ipython-input-264-4a68c8782426> in <module>() ----> 1 dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) TypeError: 'DataFrameGroupBy' object does not support item assignment
我意识到这种天真的任务不起作用。但是,将groupby操作的结果分配给父数据帧上新列的“正确” Pandas习惯用法是什么?
最后,我希望有一个名为“ MarketReturn”的列,该列将是与groupby操作的输出具有匹配日期的所有索引的重复常数值。
实现这一目标的一种方法是:
marketRetsByDate = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) bdata["MarketReturn"] = np.repeat(np.NaN, len(bdata)) for elem in marketRetsByDate.index.values: bdata["MarketReturn"][bdata["yearmonth"]==elem] = marketRetsByDate.ix[elem]
但这是缓慢,糟糕且不符合Python规范的。
In [97]: df = pandas.DataFrame({'month': np.random.randint(0,11, 100), 'A': np.random.randn(100), 'B': np.random.randn(100)}) In [98]: df.join(df.groupby('month')['A'].sum(), on='month', rsuffix='_r') Out[98]: A B month A_r 0 -0.040710 0.182269 0 -0.331816 1 -0.004867 0.642243 1 2.448232 2 -0.162191 0.442338 4 2.045909 3 -0.979875 1.367018 5 -2.736399 4 -1.126198 0.338946 5 -2.736399 5 -0.992209 -1.343258 1 2.448232 6 -1.450310 0.021290 0 -0.331816 7 -0.675345 -1.359915 9 2.722156