import pandas as pd data={'col1':[1,3,3,1,2,3,2,2]} df=pd.DataFrame(data,columns=['col1']) print df col1 0 1 1 3 2 3 3 1 4 2 5 3 6 2 7 2
我有以下Pandas DataFrame,我想创建另一列来比较col1的前一行,以查看它们是否相等。最好的方法是什么?就像下面的DataFrame。谢谢
col1 match 0 1 False 1 3 False 2 3 True 3 1 False 4 2 False 5 3 False 6 2 False 7 2 True
您需要eq使用shift:
eq
shift
df['match'] = df.col1.eq(df.col1.shift()) print (df) col1 match 0 1 False 1 3 False 2 3 True 3 1 False 4 2 False 5 3 False 6 2 False 7 2 True
或改为eq使用==,但是在大型DataFrame中,它会稍微慢一些:
==
df['match'] = df.col1 == df.col1.shift() print (df) col1 match 0 1 False 1 3 False 2 3 True 3 1 False 4 2 False 5 3 False 6 2 False 7 2 True
时间 :
import pandas as pd data={'col1':[1,3,3,1,2,3,2,2]} df=pd.DataFrame(data,columns=['col1']) print (df) #[80000 rows x 1 columns] df = pd.concat([df]*10000).reset_index(drop=True) df['match'] = df.col1 == df.col1.shift() df['match1'] = df.col1.eq(df.col1.shift()) print (df) In [208]: %timeit df.col1.eq(df.col1.shift()) The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 933 µs per loop In [209]: %timeit df.col1 == df.col1.shift() 1000 loops, best of 3: 1 ms per loop