尽管我正在执行str.decode（），但Python会引发UnicodeEncodeError。为什么？

小编典典

尽管我正在执行str.decode（），但Python会引发UnicodeEncodeError。为什么？

python

考虑以下功能：

def escape(text):
    print repr(text)
    escaped_chars = []
    for c in text:
        try:
            c = c.decode('ascii')
        except UnicodeDecodeError:
            c = '&{};'.format(htmlentitydefs.codepoint2name[ord(c)])
        escaped_chars.append(c)
    return ''.join(escaped_chars)

它应通过相应的htmlentitydefs转义所有非ascii字符。不幸的是python抛出

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)

当变量text包含字符串，其repr()是u'Tam\xe1s Horv\xe1th'。

但是，我不使用str.encode()。我只用str.decode()。我想念什么吗？

阅读 251

2021-01-20

共1个答案

小编典典

Python有两种类型的字符串：字符串（unicode类型）和字节串（str类型）。您粘贴的代码对字节字符串起作用。您需要类似的函数来处理字符串。

也许这样：

def uescape(text):
    print repr(text)
    escaped_chars = []
    for c in text:
        if (ord(c) < 32) or (ord(c) > 126):
            c = '&{};'.format(htmlentitydefs.codepoint2name[ord(c)])
        escaped_chars.append(c)
    return ''.join(escaped_chars)

我确实想知道这两个功能对您是否真正必要。如果是我，我将选择UTF-8作为结果文档的字符编码，以字符串形式处理文档（无需担心实体），并content.encode('UTF-8')在将其交付给客户端之前执行最后一步。根据所选择的Web框架，您甚至可以直接将字符串传递到API，并让其找出如何设置编码。

2021-01-20