我很好奇如何使用熊猫读取以下结构的嵌套json:
{ "number": "", "date": "01.10.2016", "name": "R 3932", "locations": [ { "depTimeDiffMin": "0", "name": "Spital am Pyhrn Bahnhof", "arrTime": "", "depTime": "06:32", "platform": "2", "stationIdx": "0", "arrTimeDiffMin": "", "track": "R 3932" }, { "depTimeDiffMin": "0", "name": "Windischgarsten Bahnhof", "arrTime": "06:37", "depTime": "06:40", "platform": "2", "stationIdx": "1", "arrTimeDiffMin": "1", "track": "" }, { "depTimeDiffMin": "", "name": "Linz/Donau Hbf", "arrTime": "08:24", "depTime": "", "platform": "1A-B", "stationIdx": "22", "arrTimeDiffMin": "1", "track": "" } ] }
这使数组保持为json。我宁愿将其扩展为列。
pd.read_json("/myJson.json", orient='records')
感谢您的第一个答案。我应该提一下我的问题:数组中嵌套属性的拼合不是强制性的。仅将[A,B,C]连接df.locations [‘name’]就可以了。
我的文件包含多个JSON对象(每行1个),我想保留number,date,name和location列。但是,我需要加入这些地点。
allLocations = "" isFirst = True for location in result.locations: if isFirst: isFirst = False allLocations = location['name'] else: allLocations += "; " + location['name'] allLocations
我在这里的方法似乎不是高效/熊猫风格。
您可以使用json_normalize:
json_normalize
import json from pandas.io.json import json_normalize with open('myJson.json') as data_file: data = json.load(data_file) df = json_normalize(data, 'locations', ['date', 'number', 'name'], record_prefix='locations_') print (df) locations_arrTime locations_arrTimeDiffMin locations_depTime \ 0 06:32 1 06:37 1 06:40 2 08:24 1 locations_depTimeDiffMin locations_name locations_platform \ 0 0 Spital am Pyhrn Bahnhof 2 1 0 Windischgarsten Bahnhof 2 2 Linz/Donau Hbf 1A-B locations_stationIdx locations_track number name date 0 0 R 3932 R 3932 01.10.2016 1 1 R 3932 01.10.2016 2 22 R 3932 01.10.2016
编辑:
你可以用read_json与解析name的DataFrame构造函数,并最后groupby与应用join:
read_json
name
DataFrame
groupby
join
df = pd.read_json("myJson.json") df.locations = pd.DataFrame(df.locations.values.tolist())['name'] df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index() print (df) date name number locations 0 2016-01-10 R 3932 Spital am Pyhrn Bahnhof,Windischgarsten Bahnho...