小编典典

使用更新的惯用法在二进制文件上进行Python文件迭代

python

在Python中,对于二进制文件,我可以这样编写:

buf_size=1024*64           # this is an important size...
with open(file, "rb") as f:
   while True:
      data=f.read(buf_size)
      if not data: break
      # deal with the data....

对于要逐行读取的文本文件,我可以编写以下代码:

with open(file, "r") as file:
   for line in file:
       # deal with each line....

简写为:

with open(file, "r") as file:
   for line in iter(file.readline, ""):
       # deal with each line....

PEP
234中
记录了该惯用语,但我无法为二进制文件找到类似的惯用语。

我已经试过了:

>>> with open('dups.txt','rb') as f:
...    for chunk in iter(f.read,''):
...       i+=1

>>> i
1                # 30 MB file, i==1 means read in one go...

我尝试放置,iter(f.read(buf_size),'')但这是语法错误,因为在iter()中的callable之后有括号。

我知道我可以编写一个函数,但是默认习惯用法有没有办法在for chunk in file:哪里使用缓冲区大小而不是面向行?

感谢您忍受Python新手尝试编写他的第一个平凡而又惯用的Python脚本。


阅读 215

收藏
2020-12-20

共1个答案

小编典典

我不知道有任何内置方法可以执行此操作,但是包装函数很容易编写:

def read_in_chunks(infile, chunk_size=1024*64):
    while True:
        chunk = infile.read(chunk_size)
        if chunk:
            yield chunk
        else:
            # The chunk was empty, which means we're at the end
            # of the file
            return

然后在交互式提示下:

>>> from chunks import read_in_chunks
>>> infile = open('quicklisp.lisp')
>>> for chunk in read_in_chunks(infile):
...     print chunk
... 
<contents of quicklisp.lisp in chunks>

当然,您可以轻松地对此进行修改以使用with块:

with open('quicklisp.lisp') as infile:
    for chunk in read_in_chunks(infile):
        print chunk

您可以消除这样的if语句。

def read_in_chunks(infile, chunk_size=1024*64):
    chunk = infile.read(chunk_size)
    while chunk:
        yield chunk
        chunk = infile.read(chunk_size)
2020-12-20