我们有一个 Web 应用程序,它可以导出包含 UTF-8 外来字符的 CSV 文件,没有 BOM。Windows 和 Mac 用户在 Excel 中都会得到乱码。我尝试使用 BOM 转换为 UTF-8;Excel/Win 没问题,Excel/Mac 显示乱码。我正在使用 Excel 2003/Win、Excel 2011/Mac。这是我尝试过的所有编码:
Encoding BOM Win Mac -------- --- ---------------------------- ------------ utf-8 -- scrambled scrambled utf-8 BOM WORKS scrambled utf-16 -- file not recognized file not recognized utf-16 BOM file not recognized Chinese gibberish utf-16LE -- file not recognized file not recognized utf-16LE BOM characters OK, same as Win row data all in first field
最好的一个是带有 BOM 的 UTF-16LE,但 CSV 不被识别。字段分隔符是逗号,但分号不会改变事情。
是否有任何编码适用于这两个?
WINDOWS-1252在处理 Excel 时,我发现编码最不令人沮丧。由于它基本上是微软自己的专有字符集,因此可以假设它可以在 Mac 和 Windows 版本的 MS-Excel 上运行。两个版本都至少包含一个相应的“文件来源”或“文件编码”选择器,可以正确读取数据。
WINDOWS-1252
根据您的系统和您使用的工具,此编码也可以命名为CP1252、ANSI、或仅命名为Windows (ANSI),以及其他变体。MS-ANSI``Windows
CP1252
ANSI
Windows (ANSI)
MS-ANSI``Windows
此编码是ISO-8859-1(又名和其他)的超集,因此如果由于某种原因无法使用LATIN1,您可以回退到。请注意,这里缺少一些字符,如下所示:ISO-8859-1``WINDOWS-1252``ISO-8859-1``WINDOWS-1252
ISO-8859-1
LATIN1
ISO-8859-1``WINDOWS-1252``ISO-8859-1``WINDOWS-1252
| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name | Unicode Range | | € | 128 | 8364 | 0x80 | U+20AC | € | euro sign | Currency Symbols | | ‚ | 130 | 8218 | 0x82 | U+201A | ‚ | single low-9 quotation mark | General Punctuation | | ƒ | 131 | 402 | 0x83 | U+0192 | ƒ | Latin small letter f with hook | Latin Extended-B | | „ | 132 | 8222 | 0x84 | U+201E | „ | double low-9 quotation mark | General Punctuation | | … | 133 | 8230 | 0x85 | U+2026 | … | horizontal ellipsis | General Punctuation | | † | 134 | 8224 | 0x86 | U+2020 | † | dagger | General Punctuation | | ‡ | 135 | 8225 | 0x87 | U+2021 | ‡ | double dagger | General Punctuation | | ˆ | 136 | 710 | 0x88 | U+02C6 | ˆ | modifier letter circumflex accent | Spacing Modifier Letters | | ‰ | 137 | 8240 | 0x89 | U+2030 | ‰ | per mille sign | General Punctuation | | Š | 138 | 352 | 0x8A | U+0160 | Š | Latin capital letter S with caron | Latin Extended-A | | ‹ | 139 | 8249 | 0x8B | U+2039 | ‹ | single left-pointing angle quotation mark | General Punctuation | | Œ | 140 | 338 | 0x8C | U+0152 | Œ | Latin capital ligature OE | Latin Extended-A | | Ž | 142 | 381 | 0x8E | U+017D | | Latin capital letter Z with caron | Latin Extended-A | | ‘ | 145 | 8216 | 0x91 | U+2018 | ‘ | left single quotation mark | General Punctuation | | ’ | 146 | 8217 | 0x92 | U+2019 | ’ | right single quotation mark | General Punctuation | | “ | 147 | 8220 | 0x93 | U+201C | “ | left double quotation mark | General Punctuation | | ” | 148 | 8221 | 0x94 | U+201D | ” | right double quotation mark | General Punctuation | | • | 149 | 8226 | 0x95 | U+2022 | • | bullet | General Punctuation | | – | 150 | 8211 | 0x96 | U+2013 | – | en dash | General Punctuation | | — | 151 | 8212 | 0x97 | U+2014 | — | em dash | General Punctuation | | ˜ | 152 | 732 | 0x98 | U+02DC | ˜ | small tilde | Spacing Modifier Letters | | ™ | 153 | 8482 | 0x99 | U+2122 | ™ | trade mark sign | Letterlike Symbols | | š | 154 | 353 | 0x9A | U+0161 | š | Latin small letter s with caron | Latin Extended-A | | › | 155 | 8250 | 0x9B | U+203A | › | single right-pointing angle quotation mark | General Punctuation | | œ | 156 | 339 | 0x9C | U+0153 | œ | Latin small ligature oe | Latin Extended-A | | ž | 158 | 382 | 0x9E | U+017E | | Latin small letter z with caron | Latin Extended-A | | Ÿ | 159 | 376 | 0x9F | U+0178 | Ÿ | Latin capital letter Y with diaeresis | Latin Extended-A |
请注意,缺少欧元符号。这张桌子可以在Alan Wood找到。
每种工具和语言的转换方式都不同。但是,假设您有一个query_result.csv您知道已UTF-8编码的文件。将其转换为WINDOWS-1252使用iconv:
query_result.csv
UTF-8
iconv
iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv