我是使用DataFrame的新手,我想知道如何在一系列表的多个列上执行等效于左外部联接的SQL
例:
df1: Year Week Colour Val1 2014 A Red 50 2014 B Red 60 2014 B Black 70 2014 C Red 10 2014 D Green 20 df2: Year Week Colour Val2 2014 A Black 30 2014 B Black 100 2014 C Green 50 2014 C Red 20 2014 D Red 40 df3: Year Week Colour Val3 2013 B Red 60 2013 C Black 80 2013 B Black 10 2013 D Green 20 2013 D Red 50
本质上,我想做类似以下SQL代码的操作(注意df3在Year上没有加入):
SELECT df1.*, df2.Val2, df3.Val3 FROM df1 LEFT OUTER JOIN df2 ON df1.Year = df2.Year AND df1.Week = df2.Week AND df1.Colour = df2.Colour LEFT OUTER JOIN df3 ON df1.Week = df3.Week AND df1.Colour = df3.Colour
结果应如下所示:
Year Week Colour Val1 Val2 Val3 2014 A Red 50 Null Null 2014 B Red 60 Null 60 2014 B Black 70 100 Null 2014 C Red 10 20 Null 2014 D Green 20 Null Null
我曾尝试使用合并和联接,但无法弄清楚如何在多个表上以及涉及多个联接时执行此操作。有人可以帮我吗?
谢谢
合并他们在两个步骤,df1和df2第一,然后那到结果df3。
df1
df2
df3
In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour'])
我从df3删除了year,因为您上次加入不需要它。
In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']], how='left', on=['Week', 'Colour']) In [40]: df Out[40]: Year Week Colour Val1 Val2 Val3 0 2014 A Red 50 NaN NaN 1 2014 B Red 60 NaN 60 2 2014 B Black 70 100 10 3 2014 C Red 10 20 NaN 4 2014 D Green 20 NaN 20 [5 rows x 6 columns]