在Python中,对于二进制文件,我可以这样编写:
buf_size=1024*64 # this is an important size... with open(file, "rb") as f: while True: data=f.read(buf_size) if not data: break # deal with the data....
对于要逐行读取的文本文件,我可以编写以下代码:
with open(file, "r") as file: for line in file: # deal with each line....
简写为:
with open(file, "r") as file: for line in iter(file.readline, ""): # deal with each line....
PEP 234中记录了该惯用语,但我无法为二进制文件找到类似的惯用语。
我已经试过了:
>>> with open('dups.txt','rb') as f: ... for chunk in iter(f.read,''): ... i+=1 >>> i 1 # 30 MB file, i==1 means read in one go...
我尝试放置,iter(f.read(buf_size),'')但这是语法错误,因为在iter()中的callable之后有括号。
iter(f.read(buf_size),'')
我知道我可以编写一个函数,但是默认习惯用法有没有办法在for chunk in file:哪里使用缓冲区大小而不是面向行?
for chunk in file:
感谢您忍受Python新手尝试编写他的第一个平凡而又惯用的Python脚本。
我不知道有任何内置方法可以执行此操作,但是包装函数很容易编写:
def read_in_chunks(infile, chunk_size=1024*64): while True: chunk = infile.read(chunk_size) if chunk: yield chunk else: # The chunk was empty, which means we're at the end # of the file return
然后在交互式提示下:
>>> from chunks import read_in_chunks >>> infile = open('quicklisp.lisp') >>> for chunk in read_in_chunks(infile): ... print chunk ... <contents of quicklisp.lisp in chunks>
当然,您可以轻松地对此进行修改以使用with块:
with open('quicklisp.lisp') as infile: for chunk in read_in_chunks(infile): print chunk
您可以消除这样的if语句。
def read_in_chunks(infile, chunk_size=1024*64): chunk = infile.read(chunk_size) while chunk: yield chunk chunk = infile.read(chunk_size)