我在pytesseract遇到问题。我需要将Tesseract配置为可以接受单个数字,同时也只能接受数字,因为数字0经常与’O’混淆。
像这样:
target = pytesseract.image_to_string(im,config='-psm 7',config='outputbase digits')
tesseract-4.0.0a支持下面psm。如果要具有单个字符识别,请设置psm = 10。并且如果您的文本仅包含数字,则可以设置tessedit_char_whitelist=0123456789。
tesseract-4.0.0a
psm
psm = 10
tessedit_char_whitelist=0123456789
Page segmentation modes: 0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
这是image_to_string带有多个参数的示例用法。
image_to_string
target = pytesseract.image_to_string(image, lang='eng', boxes=False, \ config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
希望这可以帮助。