我有一堆需要加载到MySQL数据库中的CSV数据。好吧,也许是CSV格式的。( 编辑 :实际上,它看起来像RFC4180中描述的内容)
每行是一个用逗号分隔的双引号字符串的列表。要转义出现在列值中的任何双引号,请使用双引号。允许使用反斜杠表示自己。
例如,该行:
"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants"""
如果解析为JSON,则应为:
[ "", "\\wave\\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ]
我正在尝试使用LOAD DATA读取CSV,但是我遇到了一些奇怪的行为。
LOAD DATA
例如,考虑我是否有一个简单的两列表
shell% mysql exampledb -e "describe person" +-------+-----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-----------+------+-----+---------+-------+ | ID | int(11) | YES | | NULL | | | UID | char(255) | YES | | NULL | | +-------+-----------+------+-----+---------+-------+ shell%
如果我的输入文件的第一行非标题行以结尾"":
""
shell% cat temp-1.csv "ID","UID" "9","" "0","Steve the Pirate" "1","\Alpha" "2","Hoban ""Wash"" Washburne" "3","Pastor Veal" "4","Tucker" "10","" "5","Simon" "6","Sonny" "7","Wat\"
我可以加载除第一行之外的所有非标题行:
mysql> DELETE FROM person; Query OK, 0 rows affected (0.00 sec) mysql> LOAD DATA LOCAL INFILE 'temp-1.csv' INTO TABLE person FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES ; Query OK, 9 rows affected (0.00 sec) Records: 9 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT * FROM person; +------+------------------------+ | ID | UID | +------+------------------------+ | 0 | Steve the Pirate | | 10 | | | 1 | \Alpha | | 2 | Hoban "Wash" Washburne | | 3 | Pastor Veal | | 4 | Tucker | | 5 | Simon | | 6 | Sonny | | 7 | Wat\ | +------+------------------------+ 9 rows in set (0.00 sec)
或者,我可以加载包括标题在内的所有行:
mysql> DELETE FROM person; Query OK, 9 rows affected (0.00 sec) mysql> LOAD DATA LOCAL INFILE 'temp-1.csv' INTO TABLE person FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES ; Query OK, 11 rows affected, 1 warning (0.01 sec) Records: 11 Deleted: 0 Skipped: 0 Warnings: 1 mysql> show warnings; +---------+------+--------------------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------------------+ | Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 | +---------+------+--------------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM person; +------+------------------------+ | ID | UID | +------+------------------------+ | 0 | UID | | 9 | | | 0 | Steve the Pirate | | 10 | | | 1 | \Alpha | | 2 | Hoban "Wash" Washburne | | 3 | Pastor Veal | | 4 | Tucker | | 5 | Simon | | 6 | Sonny | | 7 | Wat\ | +------+------------------------+ 11 rows in set (0.00 sec)
如果我的输入文件中没有行结尾"":
shell% cat temp-2.csv "ID","UID" "0","Steve the Pirate" "1","\Alpha" "2","Hoban ""Wash"" Washburne" "3","Pastor Veal" "4","Tucker" "5","Simon" "6","Sonny" "7","Wat\"
那么我就可以不加载任何行:
mysql> DELETE FROM person; Query OK, 11 rows affected (0.00 sec) mysql> LOAD DATA LOCAL INFILE 'temp-2.csv' INTO TABLE person FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES ; Query OK, 0 rows affected (0.00 sec) Records: 0 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT * FROM person; Empty set (0.00 sec)
mysql> DELETE FROM person; Query OK, 0 rows affected (0.00 sec) mysql> LOAD DATA LOCAL INFILE 'temp-2.csv' INTO TABLE person FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES ; Query OK, 9 rows affected, 1 warning (0.03 sec) Records: 9 Deleted: 0 Skipped: 0 Warnings: 1 mysql> show warnings; +---------+------+--------------------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------------------+ | Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 | +---------+------+--------------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM person; +------+------------------------+ | ID | UID | +------+------------------------+ | 0 | UID | | 0 | Steve the Pirate | | 1 | \Alpha | | 2 | Hoban "Wash" Washburne | | 3 | Pastor Veal | | 4 | Tucker | | 5 | Simon | | 6 | Sonny | | 7 | Wat\ | +------+------------------------+ 9 rows in set (0.00 sec)
因此,既然我发现了许多解决方法,我该如何使用LOAD DATA这些文件中的数据将其导入数据库?
根据的文档LOADDATA,默认情况下,将双引号双引号视为双引号:
LOADDATA
如果字段以ENCLOSED BY字符开头,则只有当该字段或行的TERMINATED BY序列紧随其后,该字符的实例才被视为终止字段值。为避免歧义,可以将字段值中ENCLOSED BY字符的出现次数加倍,并解释为该字符的单个实例。例如,如果指定了ENCLOSED BY’“’,则引号的处理如下所示: "The ""BIG"" boss" -> The "BIG" boss The "BIG" boss -> The "BIG" boss The ""BIG"" boss -> The ""BIG"" boss
如果字段以ENCLOSED BY字符开头,则只有当该字段或行的TERMINATED BY序列紧随其后,该字符的实例才被视为终止字段值。为避免歧义,可以将字段值中ENCLOSED BY字符的出现次数加倍,并解释为该字符的单个实例。例如,如果指定了ENCLOSED BY’“’,则引号的处理如下所示:
"The ""BIG"" boss" -> The "BIG" boss The "BIG" boss -> The "BIG" boss The ""BIG"" boss -> The ""BIG"" boss
因此,我需要做的就是\通过使用禁用解释为转义字符ESCAPED BY ''。
\
ESCAPED BY ''
LOAD DATA LOCAL INFILE 'temp-1.csv' INTO TABLE person FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '' LINES TERMINATED BY '\n' IGNORE 1 LINES ;