读取（）一个大文件时出现“ OSError：[Errno 22]无效参数”

小编典典

读取（）一个大文件时出现“ OSError：[Errno 22]无效参数”

python

我正在尝试编写一个小的脚本来打印文件的校验和（使用来自 https://gist.github.com/Zireael-N/ed36997fd1a967d78cb2的一些代码）：

import sys
import os
import hashlib

file = '/Users/Me/Downloads/2017-11-29-raspbian-stretch.img'

with open(file, 'rb') as f:
    contents = f.read()
    print('SHA256 of file is %s' % hashlib.sha256(contents).hexdigest())

但是我收到以下错误消息：

Traceback (most recent call last):
  File "checksum.py", line 8, in <module>
    contents = f.read()
OSError: [Errno 22] Invalid argument

我究竟做错了什么？我在macOS High Sierra上使用python 3

阅读 125

2021-01-16

共1个答案

小编典典

在Python的历史上，存在
 多个
 问题（在最新版本中已修复），从文件句柄一次读取超过2-4
GB的数据（该问题的无法修复的版本也发生在32位版本的Python上，但它们根本缺乏）分配缓冲区的虚拟地址空间；与I /
O不相关，但最常见的是处理大型文件。可用于散列的一种变通方法是以固定大小的块更新散列（无论如何，这是一个好主意，因为指望RAM大于文件大小是一个糟糕的主意）。最直接的方法是将代码更改为：

with open(file, 'rb') as f:
    hasher = hashlib.sha256()  # Make empty hasher to update piecemeal
    while True:
        block = f.read(64 * (1 << 20)) # Read 64 MB at a time; big, but not memory busting
        if not block:  # Reached EOF
            break
        hasher.update(block)  # Update with new block
print('SHA256 of file is %s' % hasher.hexdigest())  # Finalize to compute digest

如果您喜欢，可以使用two-argiter和一些functools魔术“简化”循环，将整个while循环替换为：

for block in iter(functools.partial(f.read, 64 * (1 << 20)), b''):
    hasher.update(block)

在Python
3.8+上，使用walrus运算符，:=它更简单，无需导入或不可读的代码：

while block := f.read(64 * (1 << 20)):  # Assigns and tests result in conditional!
    hasher.update(block)

2021-01-16