给定两个数据框:
df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3))) df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1))) df1 # CustomerId Product # 1 Toaster # 2 Toaster # 3 Toaster # 4 Radio # 5 Radio # 6 Radio df2 # CustomerId State # 2 Alabama # 4 Alabama # 6 Ohio
我该如何做数据库风格,即sql 风格,连接?也就是说,我如何得到:
and的内连接: 仅返回左表在右表中具有匹配键的行。df1``df2
df1``df2
and的外连接:返回两个表中的 所有行,从左侧连接在右侧表中具有匹配键的记录。df1``df2
和的左外连接(或简称左连接)返回左表中的 所有行,以及右表中具有匹配键的任何行df1。df2
df1
df2
右外连接df1并返回右表中的df2 所有行,以及左表中具有匹配键的任何行。
额外学分:
如何执行 SQL 样式的选择语句?
通过使用merge函数及其可选参数:
merge
内连接: merge(df1, df2)将适用于这些示例,因为 R 会通过公共变量名称自动连接帧,但您很可能希望指定merge(df1, df2, by = "CustomerId")以确保仅匹配所需的字段。如果匹配的变量在不同的数据框中具有不同的名称,您也可以使用by.xandby.y
merge(df1, df2)
merge(df1, df2, by = "CustomerId")
by.x
by.y
外联: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
左外: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
右外: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
交叉连接: merge(x = df1, y = df2, by = NULL)
merge(x = df1, y = df2, by = NULL)
与内部连接一样,您可能希望将“CustomerId”作为匹配变量显式传递给 R。 我认为最好明确说明要合并的标识符;如果输入 data.frames 意外更改并且以后更易于阅读,则更安全。
您可以通过给出by一个向量来合并多个列,例如by = c("CustomerId", "OrderId").
by
by = c("CustomerId", "OrderId")
如果要合并的列名不同,您可以指定,例如,by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2"第CustomerId_in_df1一个数据框中CustomerId_in_df2的列名和第二个数据框中的列名。(如果您需要在多列上合并,这些也可以是向量。)
by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2"
CustomerId_in_df1
CustomerId_in_df2