我有这样的输入文件:
This is a text block start This is the end And this is another with more than one line and another line.
所需的任务是按由特殊行分隔的部分读取文件,在这种情况下,该行为空行,例如[out]:
[['This is a text block start', 'This is the end'], ['And this is another','with more than one line', 'and another line.']]
通过这样做,我一直在获得所需的输出:
def per_section(it): """ Read a file and yield sections using empty line as delimiter """ section = [] for line in it: if line.strip('\n'): section.append(line) else: yield ''.join(section) section = [] # yield any remaining lines as a section too if section: yield ''.join(section)
但是,如果特殊行是以#例如以下开头的行:
#
# Some comments, maybe the title of the following section This is a text block start This is the end # Some other comments and also the title And this is another with more than one line and another line.
我必须这样做:
def per_section(it): """ Read a file and yield sections using empty line as delimiter """ section = [] for line in it: if line[0] != "#": section.append(line) else: yield ''.join(section) section = [] # yield any remaining lines as a section too if section: yield ''.join(section)
如果我允许per_section()拥有分隔符参数,则可以尝试以下操作:
per_section()
def per_section(it, delimiter== '\n'): """ Read a file and yield sections using empty line as delimiter """ section = [] for line in it: if line.strip('\n') and delimiter == '\n': section.append(line) elif delimiter= '\#' and line[0] != "#": section.append(line) else: yield ''.join(section) section = [] # yield any remaining lines as a section too if section: yield ''.join(section)
但是有没有办法我不对所有可能的分隔符进行硬编码?
传递谓词怎么样?
def per_section(it, is_delimiter=lambda x: x.isspace()): ret = [] for line in it: if is_delimiter(line): if ret: yield ret # OR ''.join(ret) ret = [] else: ret.append(line.rstrip()) # OR ret.append(line) if ret: yield ret
用法:
with open('/path/to/file.txt') as f: sections = list(per_section(f)) # default delimiter with open('/path/to/file.txt.txt') as f: sections = list(per_section(f, lambda line: line.startswith('#'))) # comment