熊猫分组的月份和年份

小编典典

熊猫分组的月份和年份

python

我有以下数据框：

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   60     80

我需要按年份和月份对数据进行分组。例如：按2013年1月，2013年2月，2013年3月等分组。我将使用新分组的数据创建一个显示每年/每月abc vs
xyz的图表。

我已经尝试了groupby和sum的各种组合，但是似乎什么也无法工作。

感谢您的协助。

阅读 213

2020-12-20

共1个答案

小编典典

您可以使用重采样或Grouper（在后台重采样）。

首先，请确保datetime列实际上是datetimes（用命中pd.to_datetime）。如果它是DatetimeIndex会更容易：

In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

In [14]: df1.resample("M", how='sum')  # the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

注意：以前的pd.Grouper(freq="M")写为pd.TimeGrouper("M")。从0.21开始不推荐使用后者。

我曾以为以下方法会起作用，但不会（由于as_index未得到尊重？我不确定。）。出于兴趣考虑，我将其包括在内。

如果它是一列（必须是datetime64列！就像我说的那样，to_datetime用来打它），则可以使用PeriodIndex：

In [21]: df
Out[21]:
        Date  abc  xyz
0 2013-06-01  100  200
1 2013-06-03  -20   50
2 2013-08-15   40   -5
3 2014-01-20   25   15
4 2014-02-21   60   80

In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M

In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

In [24]: g = df.groupby(per)

In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
         abc  xyz
2013-06   80  250
2013-08   40   -5
2014-01   25   15
2014-02   60   80

为了获得理想的结果，我们必须重新索引…

2020-12-20