我正在尝试向pandas数据框添加新索引。数据框如下所示:
date price neg_vol pos_vol 0 2017-10-17 01:00:00 51.88 11 4 1 2017-10-17 01:00:00 51.89 10 2 2 2017-10-17 01:00:00 51.90 16 27 3 2017-10-17 01:00:00 51.91 1 10 4 2017-10-17 01:05:00 51.87 12 0 5 2017-10-17 01:05:00 51.88 0 12 6 2017-10-17 01:10:00 51.87 8 0 7 2017-10-17 01:10:00 51.88 0 5 8 2017-10-17 01:15:00 51.87 12 0 9 2017-10-17 01:15:00 51.88 0 8 10 2017-10-17 01:20:00 51.87 6 0
这是我想要得到的结果:
index date price neg_vol pos_vol 0 1 2017-10-17 01:00:00 51.88 11 4 1 1 2017-10-17 01:00:00 51.89 10 2 2 1 2017-10-17 01:00:00 51.90 16 27 3 1 2017-10-17 01:00:00 51.91 1 10 4 2 2017-10-17 01:05:00 51.87 12 0 5 2 2017-10-17 01:05:00 51.88 0 12 6 3 2017-10-17 01:10:00 51.87 8 0 7 3 2017-10-17 01:10:00 51.88 0 5 8 4 2017-10-17 01:15:00 51.87 12 0 9 4 2017-10-17 01:15:00 51.88 0 8 10 5 2017-10-17 01:20:00 51.87 6 0
如您所见,索引列是根据日期列设置的。如果行的日期相同,则它们共享相同的索引号。我认为可以通过一些条件循环来完成,但是我想知道是否有更简单的方法可以做到这一点。
用途ngroup:
ngroup
对于新列 index
index
df[‘index’] = df.groupby(‘date’, sort=False).ngroup() + 1 print (df) date price neg_vol pos_vol index 0 2017-10-17 01:00:00 51.88 11 4 1 1 2017-10-17 01:00:00 51.89 10 2 1 2 2017-10-17 01:00:00 51.90 16 27 1 3 2017-10-17 01:00:00 51.91 1 10 1 4 2017-10-17 01:05:00 51.87 12 0 2 5 2017-10-17 01:05:00 51.88 0 12 2 6 2017-10-17 01:10:00 51.87 8 0 3 7 2017-10-17 01:10:00 51.88 0 5 3 8 2017-10-17 01:15:00 51.87 12 0 4 9 2017-10-17 01:15:00 51.88 0 8 4 10 2017-10-17 01:20:00 51.87 6 0 5
用于新索引
df.index = df.groupby(‘date’, sort=False).ngroup() + 1 print (df) date price neg_vol pos_vol 1 2017-10-17 01:00:00 51.88 11 4 1 2017-10-17 01:00:00 51.89 10 2 1 2017-10-17 01:00:00 51.90 16 27 1 2017-10-17 01:00:00 51.91 1 10 2 2017-10-17 01:05:00 51.87 12 0 2 2017-10-17 01:05:00 51.88 0 12 3 2017-10-17 01:10:00 51.87 8 0 3 2017-10-17 01:10:00 51.88 0 5 4 2017-10-17 01:15:00 51.87 12 0 4 2017-10-17 01:15:00 51.88 0 8 5 2017-10-17 01:20:00 51.87 6 0
另一个解决方案是factorize:
factorize
df['index'] = pd.factorize(df['date'])[0] + 1
df.index = pd.factorize(df['date'])[0] + 1