HOCR2DJVUSED(1) | hocr2djvused manual | HOCR2DJVUSED(1) |
NAME¶
hocr2djvused - hOCR to djvused script converter
SYNOPSIS¶
hocr2djvused [option...] [hocr-file...]
DESCRIPTION¶
hocr2djvused reads one or more hOCR[1] files (as produced by OCRopus[2] or Cuneiform[3] or Tesseract[4]) and converts them to a djvused script.
Unless a filename is explicitly provided on the command line, hOCR is read from the standard input.
OPTIONS¶
Text segmentation options¶
-t lines, --details lines
-t words, --details=words
This is the default.
-t chars, --details=chars
--word-segmentation=simple
This is the default, despite being linguistically incorrect.
--word-segmentation=uax29
This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended.
Other options¶
--rotation=n
--page-size=widthxheight
This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous otherwise.
--html5
--fix-utf8
This option might be needed for hOCR generated by Cuneiform[7] or Tesseract[8].
--version
-h, --help
BUGS¶
Please report bugs at: https://github.com/jwilk/ocrodjvu/issues
SEE ALSO¶
djvu(1), ocrodjvu(1), djvu2hocr(1), djvused(1)
NOTES¶
- 1.
- hOCR
- 2.
- OCRopus
- 3.
- Cuneiform
- 4.
- Tesseract
- 5.
- Unicode Text Segmentation
- 6.
- HTML5 parser
2019-02-06 | ocrodjvu 0.11 |