如何在zip中打开unicode文本文件？

小编典典

如何在zip中打开unicode文本文件？

python

我试过了

with zipfile.ZipFile("5.csv.zip", "r") as zfile:
    for name in zfile.namelist():
        with zfile.open(name, 'rU') as readFile:
                line = readFile.readline()
                print(line)
                split = line.split('\t')

它回答：

b'$0.0\t1822\t1\t1\t1\n'
Traceback (most recent call last)
File "zip.py", line 6
    split = line.split('\t')
TypeError: Type str doesn't support the buffer API

如何以unicode而不是as形式打开文本文件b？

阅读 701

2021-01-20

共1个答案

小编典典

编辑对于Python3，使用io.TextIOWrapper作为这个答案介绍是最好的选择。以下答案对于2.x可能仍然有用。我认为即使对于3.x，下面的任何内容实际上都不正确，但io.TestIOWrapper仍然更好。

如果文件是utf-8，则可以使用：

# the rest of the code as above, then:
with zfile.open(name, 'rU') as readFile:
    line = readFile.readline().decode('utf8')
    # etc

如果您要遍历可以使用的文件codecs.iterdecode，则无法使用readline()。

with zfile.open(name, 'rU') as readFile:
    for line in codecs.iterdecode(readFile, 'utf8'):
        print line
        # etc

请注意，这两种方法对于多字节编码都不一定是安全的。例如，小尾数UTF-16用bytes表示换行符b'\x0A\x00'。一个寻找换行符的非Unicode感知工具将错误地将其拆分，从而在下一行保留空字节。在这种情况下，您必须使用不会尝试用换行符分隔输入的内容，例如ZipFile.read，然后立即解码整个字节字符串。这不是utf-8的问题。

2021-01-20