Linux中的Python OCR模块？

小编典典

Linux中的Python OCR模块？

python

我想在linux中找到一个易于使用的OCR python模块，我发现pytesser
http://code.google.com/p/pytesser/，但是它包含一个.exe可执行文件。

我尝试过更改代码以使用wine，它确实有效，但是它太慢了，真的不是一个好主意。

有没有像它一样易于使用的Linux替代品？

阅读 324

2021-01-20

共1个答案

小编典典

您可以只包装tesseract一个函数：

import os
import tempfile
import subprocess

def ocr(path):
    temp = tempfile.NamedTemporaryFile(delete=False)

    process = subprocess.Popen(['tesseract', path, temp.name], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    process.communicate()

    with open(temp.name + '.txt', 'r') as handle:
        contents = handle.read()

    os.remove(temp.name + '.txt')
    os.remove(temp.name)

    return contents

如果您希望文档分割和更多高级功能，请尝试OCRopus。

2021-01-20