我正在努力解决这个问题。我有一个JSON文件,需要将其放到CSV中,如果结构是扁平的,没有深层嵌套的项目,就可以了。
但是在这种情况下,嵌套会使RACES我烦恼。
RACES
我将如何以这种格式获取数据:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
每个对象和对象中的每个种族?
{ "1": { "VENUE": "JOEBURG", "COUNTRY": "HAE", "ITW": "XAD", "RACES": { "1": { "NO": 1, "TIME": "12:35" }, "2": { "NO": 2, "TIME": "13:10" }, "3": { "NO": 3, "TIME": "13:40" }, "4": { "NO": 4, "TIME": "14:10" }, "5": { "NO": 5, "TIME": "14:55" }, "6": { "NO": 6, "TIME": "15:30" }, "7": { "NO": 7, "TIME": "16:05" }, "8": { "NO": 8, "TIME": "16:40" } } }, "2": { "VENUE": "FOOBURG", "COUNTRY": "ABA", "ITW": "XAD", "RACES": { "1": { "NO": 1, "TIME": "12:35" }, "2": { "NO": 2, "TIME": "13:10" }, "3": { "NO": 3, "TIME": "13:40" }, "4": { "NO": 4, "TIME": "14:10" }, "5": { "NO": 5, "TIME": "14:55" }, "6": { "NO": 6, "TIME": "15:30" }, "7": { "NO": 7, "TIME": "16:05" }, "8": { "NO": 8, "TIME": "16:40" } } }, ... }
我想这样输出到CSV:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME JOEBERG, HAE, XAD, 1, 12:35 JOEBERG, HAE, XAD, 2, 13:10 JOEBERG, HAE, XAD, 3, 13:40 ... ... FOOBURG, ABA, XAD, 1, 12:35 FOOBURG, ABA, XAD, 2, 13:10
所以首先我得到正确的密钥:
self.keys = self.data.keys() keys = ["DATA_KEY"] for key in self.keys: if type(self.data[key]) == dict: for k in self.data[key].keys(): if k not in keys: if type(self.data[key][k]) == unicode: keys.append(k) elif type(self.data[key][k]) == dict: self.subkey = k for sk in self.data[key][k].values(): for subkey in sk.keys(): subkey = "%s__%s" % (self.subkey, subkey) if subkey not in keys: keys.append(subkey)
然后添加数据:
但是如何?
对于您熟练的forlooper来说,这应该是一个有趣的过程。 ;-)
;-)
我只收集第一个对象的键,然后假定格式的其余部分是一致的。
以下代码还将嵌套对象限制为 一个 ;您没有指定当一个以上时应该发生的情况。具有两个或多个相同长度的嵌套结构可以工作(将它们“压缩”在一起),但是,如果您具有不同长度的结构,则需要做出明确的选择来处理它们。用空列压缩以填充或写出这些条目的乘积(A x B行,每次找到B条目都从A重复信息)。
import csv from operator import itemgetter with open(outputfile, 'wb') as outf: writer = None # will be set to a csv.DictWriter later for key, item in sorted(data.items(), key=itemgetter(0)): row = {} nested_name, nested_items = '', {} for k, v in item.items(): if not isinstance(v, dict): row[k] = v else: assert not nested_items, 'Only one nested structure is supported' nested_name, nested_items = k, v if writer is None: # build fields for each first key of each nested item first fields = sorted(row) # sorted keys of first item in key sorted order nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1]) fields.extend('__'.join((nested_name, k)) for k in nested_keys) writer = csv.DictWriter(outf, fields) writer.writeheader() for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)): row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items()) writer.writerow(row)
对于您的样本输入,将产生:
COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME HAE,XAD,JOEBURG,1,12:35 HAE,XAD,JOEBURG,2,13:10 HAE,XAD,JOEBURG,3,13:40 HAE,XAD,JOEBURG,4,14:10 HAE,XAD,JOEBURG,5,14:55 HAE,XAD,JOEBURG,6,15:30 HAE,XAD,JOEBURG,7,16:05 HAE,XAD,JOEBURG,8,16:40 ABA,XAD,FOOBURG,1,12:35 ABA,XAD,FOOBURG,2,13:10 ABA,XAD,FOOBURG,3,13:40 ABA,XAD,FOOBURG,4,14:10 ABA,XAD,FOOBURG,5,14:55 ABA,XAD,FOOBURG,6,15:30 ABA,XAD,FOOBURG,7,16:05 ABA,XAD,FOOBURG,8,16:40