编码/解码有什么区别？

小编典典

编码/解码有什么区别？

all

我一直不确定我是否理解 str/unicode 解码和编码之间的区别。

我知道这str().decode()是因为当你有一个你知道具有某种字符编码的字节字符串时，考虑到该编码名称，它将返回一个 unicode 字符串。

我知道unicode().encode()根据给定的编码名称将 unicode 字符转换为字节串。

但我不明白什么str().encode()和unicode().decode()是为了什么。任何人都可以解释，并可能纠正我在上面弄错的任何其他内容吗？

编辑：

几个答案提供了.encode关于字符串的作用的信息，但似乎没有人知道.decodeunicode 的作用。

阅读 192

2022-07-31

共1个答案

小编典典

unicode 字符串的decode方法实际上根本没有任何应用程序（除非您出于某种原因在 unicode
字符串中有一些非文本数据——见下文）。我认为这主要是出于历史原因。在 Python 3 中，它完全消失了。

unicode().decode()将使用默认（ascii）编解码器执行隐式编码。s像这样验证：

>>> s = u'枚'
>>> s.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0:
ordinal not in range(128)

>>> s.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0:
ordinal not in range(128)

错误消息完全相同。

因为str().encode()它是相反的——它尝试使用默认编码进行隐式 解码：s

>>> s = '枚'
>>> s.decode('utf-8')
u'\xf6'
>>> s.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)

这样用，str().encode()也是多余的。

但是
后一种方法的另一个应用是有用的：有些编码与字符集无关，因此可以以有意义的方式应用于 8 位字符串：

>>> s.encode('zip')
'x\x9c;\xbc\r\x00\x02>\x01z'

不过，您是对的：这两个应用程序对“编码”的模棱两可的用法是……很尴尬。同样，在 Python 3
中使用单独的byte和string类型，这不再是一个问题。

2022-07-31