我想csv.DictReader从文件中推断出字段名称。文档说: “如果省略fieldnames参数,则csvfile第一行中的值将用作字段名。” ,但在我的情况下,第一行包含标题,第二行包含名称。
csv.DictReader
我无法next(reader)按照Python 3.2的要求在csv.DictReader中跳过一行,因为在初始化读取器时发生了字段名分配(否则我做错了)。
next(reader)
csvfile(从Excel 2010导出,原始源):
CanVec v1.1.0,,,,,,,,,^M Entity,Attributes combination,"Specification Code Point","Specification Code Line","Specification Code Area",Generic Code,Theme,"GML - Entity name Shape - File name Point","GML - Entity name Shape - File name Line","GML - Entity name Shape - File name Area"^M Amusement park,Amusement park,,,2260012,2260009,LX,,,LX_2260009_2^M Auto wrecker,Auto wrecker,,,2360012,2360009,IC,,,IC_2360009_2^M
我的代码:
f = open(entities_table,'rb') try: dialect = csv.Sniffer().sniff(f.read(1024)) f.seek(0) reader = csv.DictReader(f, dialect=dialect) print 'I think the field names are:\n%s\n' % (reader.fieldnames) i = 0 for row in reader: if i < 20: print row i = i + 1 finally: f.close()
当前结果:
I think the field names are: ['CanVec v1.1.0', '', '', '', '', '', '', '', '', '']
所需结果:
I think the field names are: ['Entity','Attributes combination','"Specification Code Point"',...snip]
我意识到只删除第一行并继续进行是很方便的,但是我正在尝试尽可能地就地读取数据并尽量减少人工干预。
我从itertools使用过islice。我的标题位于重要序言的最后一行。我已经通过了序言,并使用hederline作为字段名:
with open(file, "r") as f: '''Pass preamble''' n = 0 for line in f.readlines(): n += 1 if 'same_field_name' in line: # line with field names was found h = line.split(',') break f.close() f = islice(open(i, "r"), n, None) reader = csv.DictReader(f, fieldnames = h)