我有一个包含以下内容的文本文件。我想将此文件拆分为多个文件(1.txt,2.txt,3.txt …)。每个新的输出文件将如下所示。我尝试的代码无法正确分割输入文件。如何将输入文件拆分为多个文件?
我的代码:
#!/usr/bin/python with open("input.txt", "r") as f: a1=[] a2=[] a3=[] for line in f: if not line.strip() or line.startswith('A') or line.startswith('$$'): continue row = line.split() a1.append(str(row[0])) a2.append(float(row[1])) a3.append(float(row[2])) f = open('1.txt','a') f = open('2.txt','a') f = open('3.txt','a') f.write(str(a1)) f.close()
输入文件:
A x k .. $$ A z m .. $$ A B l .. $$
所需的输出1.txt
A x k .. $$
所需的输出2.txt
A z m .. $$
所需的输出3.txt
A B l .. $$
尝试re.findall()函数:
import re with open('input.txt', 'r') as f: data = f.read() found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S) [open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]
前3次出现的 简约方法:
import re found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S) [open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]
一些解释:
found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
将查找与指定RegEx匹配的所有匹配项,并将它们放入 列表中 ,称为found
found
[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]
遍历(属于列表)所有元素(使用列表推导),found并为每个元素创建文本文件(称为“ index of the element + 1.txt”),并将该元素(出现)写入该文件。
index of the element + 1
没有RegEx的另一个版本:
blocks_to_read = 3 blk_begin = 'A' blk_end = '$$' with open('35916503.txt', 'r') as f: fn = 1 data = [] write_block = False for line in f: if fn > blocks_to_read: break line = line.strip() if line == blk_begin: write_block = True if write_block: data.append(line) if line == blk_end: write_block = False with open(str(fn) + '.txt', 'w') as fout: fout.write('\n'.join(data)) data = [] fn += 1
PS我个人不喜欢这个版本,我会使用RegEx