오픈소스 OCR로 우분투에 패키지로 설치가 가능하고
gimagereader 라는 frontend도 있으니 참고를..
[링크 : https://sourceforge.net/projects/gimagereader/]
간단한 사용법은 아래와 같이 파일명 - -l eng 해주면 되는데, 아마 -를 넣어서 stdout으로 출력하라는걸지도?
$ tesseract images/eurotext.png - -l eng |
[링크 : https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html]
psm은 어떤식으로 읽을지 모드를 설정해주는데 OSD(Orientation and Script Detection) 이라는게
있냐 없냐로, 2문자 까지 인식하게 할 순 있다(--psm 1)
+
psm 10으로 하면 단문자도 인식하게 설정이 가능하다
$ tesseract --help-extra Usage: tesseract --help | --help-extra | --help-psm | --help-oem | --version tesseract --list-langs [--tessdata-dir PATH] tesseract --print-parameters [options...] [configfile...] tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...] OCR options: --tessdata-dir PATH Specify the location of tessdata path. --user-words PATH Specify the location of user words file. --user-patterns PATH Specify the location of user patterns file. --dpi VALUE Specify DPI for input image. -l LANG[+LANG] Specify language(s) used for OCR. -c VAR=VALUE Set value for config variables. Multiple -c arguments are allowed. --psm NUM Specify page segmentation mode. --oem NUM Specify OCR Engine mode. NOTE: These options must occur before any configfile. Page segmentation modes: 0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. (not implemented) 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific. OCR Engine modes: (see https://github.com/tesseract-ocr/tesseract/wiki#linux) 0 Legacy engine only. 1 Neural nets LSTM engine only. 2 Legacy + LSTM engines. 3 Default, based on what is available. Single options: -h, --help Show minimal help message. --help-extra Show extra help for advanced users. --help-psm Show page segmentation modes. --help-oem Show OCR Engine modes. -v, --version Show version information. --list-langs List available languages for tesseract engine. --print-parameters Print tesseract parameters. |
[링크 : https://kokokorin-bigbox.tistory.com/53]
테스트 해보니 1 문자는 어떻게 하든 안되는 것 같은데 회피가 가능하려나?
$ tesseract a5.png - --psm 3 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 406 ak $ tesseract a5.png - --psm 2 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 406 Orientation: 0 WritingDirection: 0 TextlineOrder: 2 Deskew angle: 0.0000 $ tesseract a5.png - --psm 1 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 406 Too few characters. Skipping this page OSD: Weak margin (0.00) for 2 blob text block, but using orientation anyway: 0 ak $ tesseract a5.png - --psm 0 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 406 Too few characters. Skipping this page Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing. |
$ tesseract a6.png - --psm 0 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 294 Too few characters. Skipping this page Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing. $ tesseract a6.png - --psm 1 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 294 Too few characters. Skipping this page OSD: Weak margin (0.00) for 1 blob text block, but using orientation anyway: 0 Empty page!! Estimating resolution as 294 Too few characters. Skipping this page OSD: Weak margin (0.00) for 1 blob text block, but using orientation anyway: 0 Empty page!! $ tesseract a6.png - --psm 2 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 294 Empty page!! $ tesseract a6.png - --psm 3 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 294 Empty page!! Estimating resolution as 294 Empty page!! |
'프로그램 사용 > tesseract ocr' 카테고리의 다른 글
tesseract 버전별 차이? (0) | 2023.12.27 |
---|---|
tesseract 학습 데이터 (0) | 2023.12.27 |
tesseract on arm (0) | 2023.12.26 |
번호판 인식(tesseract) (0) | 2021.10.14 |