小编典典

如何从 JSON 中获取字符串对象而不是 Unicode？

all

我正在使用 Python 2 从 ASCII 编码 的文本文件中解析 JSON。

使用json或
加载这些文件时simplejson，我的所有字符串值都将转换为
Unicode 对象而不是字符串对象。问题是，我必须将数据与一些只接受字符串对象的库一起使用。我 不能更改库 也不能更新它们。

是否可以获取字符串对象而不是 Unicode 对象？

例子

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

更新

这个问题是 很久以前问的，当时我被 Python 2 困住了。今天一个简单而干净的解决方案是使用 Python 的最新版本——即
Python 3 及更高版本。

阅读 96

2022-04-22

共1个答案

小编典典

一个解决方案`object_hook`

[编辑]：针对 Python 2.7 和 3.x 兼容性进行了更新。

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    if isinstance(data, str):
        return data

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.items() # changed to .items() for python 2.7/3
        }

    # python 3 compatible duck-typing
    # if this is a unicode string, return its string representation
    if str(type(data)) == "<type 'unicode'>":
        return data.encode('utf-8')

    # if it's anything else, return it in its original form
    return data

示例用法：

>>> **_json_loads_byteified('{"Hello": "World"}')_**
{'Hello': 'World'}
>>> **_json_loads_byteified('"I am a top-level string"')_**
'I am a top-level string'
>>> **_json_loads_byteified('7')_**
7
>>> **_json_loads_byteified('["I am inside a list"]')_**
['I am inside a list']
>>> **_json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')_**
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> **_json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')_**
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> **_json_load_byteified(open('somefile.json'))_**
{'more json': 'from a file'}

这是如何工作的，我为什么要使用它？

Mark Amery
的功能比这些更短更清晰，那么它们有什么意义呢？为什么要使用它们？

纯粹为了性能。Mark 的回答首先使用 unicode 字符串完全解码 JSON
文本，然后递归整个解码值以将所有字符串转换为字节字符串。这有几个不良影响：

整个解码结构的副本在内存中创建
如果您的 JSON 对象嵌套非常深（500 层或更多），那么您将达到 Python 的最大递归深度

这个答案通过使用和的object_hook参数来缓解这两个性能问题。从文档：json.load``json.loads

object_hook是一个可选函数，将使用解码的任何对象字面量 (a dict) 的结果调用。将使用 object_hook
的返回值而不是dict. 此功能可用于实现自定义解码器

由于嵌套在其他字典深处的许多级别的字典object_hook 在解码时会被传递给它们
，因此我们可以在那时将它们内部的任何字符串或列表字节化，并避免以后需要深度递归。

马克的答案不适合object_hook作为它的代表使用，因为它递归到嵌套字典中。ignore_dicts我们在这个答案中使用参数 to
来防止这种递归_byteify，它一直被传递给它，除非
将object_hook它传递给一个新dict的字节化。该ignore_dicts标志告诉_byteify忽略dicts，因为它们已经被字节化。

最后，我们对返回的结果执行json_load_byteified和json_loads_byteified调用_byteify(with
ignore_dicts=True)json.load或处理被解码的 JSON 文本在顶层json.loads没有 a的情况。dict

2022-04-22

如何从 JSON 中获取字符串对象而不是 Unicode？

例子

更新

共1个答案

一个解决方案object_hook

这是如何工作的，我为什么要使用它？

一个解决方案`object_hook`