我有以下2个数据帧:
df_a = mukey DI PI 0 100000 35 14 1 1000005 44 14 2 1000006 44 14 3 1000007 43 13 4 1000008 43 13 df_b = mukey niccdcd 0 190236 4 1 190237 6 2 190238 7 3 190239 4 4 190240 7
当我尝试加入这两个数据框时:
join_df = df_a.join(df_b,on='mukey',how='left')
我得到错误:
*** ValueError: columns overlap but no suffix specified: Index([u'mukey'], dtype='object')
为什么会这样呢?数据帧确实具有通用的“ mukey”值。
您发布的数据片段中的错误有点神秘,因为没有通用值,所以联接操作失败,因为这些值不重叠,这需要您在左侧和右侧提供后缀:
In [173]: df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right') Out[173]: mukey_left DI PI mukey_right niccdcd index 0 100000 35 14 NaN NaN 1 1000005 44 14 NaN NaN 2 1000006 44 14 NaN NaN 3 1000007 43 13 NaN NaN 4 1000008 43 13 NaN NaN
merge 之所以有效,是因为它没有此限制:
merge
In [176]: df_a.merge(df_b, on='mukey', how='left') Out[176]: mukey DI PI niccdcd 0 100000 35 14 NaN 1 1000005 44 14 NaN 2 1000006 44 14 NaN 3 1000007 43 13 NaN 4 1000008 43 13 NaN