小编典典

从CSV文件中加载数据,其中双引号用作转义字符

sql

我有一堆需要加载到MySQL数据库中的CSV数据。好吧,也许是CSV格式的。( 编辑实际上,它看起来像RFC4180中描述的内容

每行是一个用逗号分隔的双引号字符串的列表。要转义出现在列值中的任何双引号,请使用双引号。允许使用反斜杠表示自己。

例如,该行:

"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants"""

如果解析为JSON,则应为:

[ "", "\\wave\\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ]

我正在尝试使用LOAD DATA读取CSV,但是我遇到了一些奇怪的行为。


例如,考虑我是否有一个简单的两列表

shell% mysql exampledb -e "describe person"
+-------+-----------+------+-----+---------+-------+
| Field | Type      | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| ID    | int(11)   | YES  |     | NULL    |       |
| UID   | char(255) | YES  |     | NULL    |       |
+-------+-----------+------+-----+---------+-------+
shell%

如果我的输入文件的第一行非标题行以结尾""

shell% cat temp-1.csv
"ID","UID"
"9",""
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"10",""
"5","Simon"
"6","Sonny"
"7","Wat\"

我可以加载除第一行之外的所有非标题行:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-1.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 1 LINES
       ;
Query OK, 9 rows affected (0.00 sec)
Records: 9  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | Steve the Pirate       |
|   10 |                        |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
9 rows in set (0.00 sec)

或者,我可以加载包括标题在内的所有行:

mysql> DELETE FROM person;
Query OK, 9 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-1.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 0 LINES
       ;
Query OK, 11 rows affected, 1 warning (0.01 sec)
Records: 11  Deleted: 0  Skipped: 0  Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | UID                    |
|    9 |                        |
|    0 | Steve the Pirate       |
|   10 |                        |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
11 rows in set (0.00 sec)

如果我的输入文件中没有行结尾""

shell% cat temp-2.csv
"ID","UID"
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"5","Simon"
"6","Sonny"
"7","Wat\"

那么我就可以不加载任何行:

mysql> DELETE FROM person;
Query OK, 11 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-2.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 1 LINES
       ;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM person;
Empty set (0.00 sec)

或者,我可以加载包括标题在内的所有行:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-2.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 0 LINES
       ;
Query OK, 9 rows affected, 1 warning (0.03 sec)
Records: 9  Deleted: 0  Skipped: 0  Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | UID                    |
|    0 | Steve the Pirate       |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
9 rows in set (0.00 sec)

因此,既然我发现了许多解决方法,我该如何使用LOAD DATA这些文件中的数据将其导入数据库?


阅读 504

收藏
2021-03-23

共1个答案

小编典典

根据的文档LOADDATA,默认情况下,将双引号双引号视为双引号

如果字段以ENCLOSED BY字符开头,则只有当该字段或行的TERMINATED
BY序列紧随其后,该字符的实例才被视为终止字段值。为避免歧义,可以将字段值中ENCLOSED
BY字符的出现次数加倍,并解释为该字符的单个实例。例如,如果指定了ENCLOSED BY’“’,则引号的处理如下所示:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss      -> The "BIG" boss
The ""BIG"" boss    -> The ""BIG"" boss

因此,我需要做的就是\通过使用禁用解释为转义字符ESCAPED BY ''

LOAD DATA
  LOCAL INFILE 'temp-1.csv'
  INTO TABLE person
  FIELDS
    TERMINATED BY ','
    ENCLOSED BY '"'
    ESCAPED BY ''
  LINES
    TERMINATED BY '\n'
  IGNORE 1 LINES
;
2021-03-23