table of contents
other versions
- buster 0.10.4-1
DJVU2HOCR(1) | djvu2hocr manual | DJVU2HOCR(1) |
NAME¶
djvu2hocr - DjVu to hOCR converterSYNOPSIS¶
djvu2hocr [option...] djvu-file
djvu2hocr {--version | --help | -h}
DESCRIPTION¶
djvu2hocr converts hidden text from a DjVu file to the hOCR[1] format.OPTIONS¶
Input selection options¶
-p, --pages=page-rangeSpecifies pages to covert. page-range is a
comma-separated list of sub-ranges. Each sub-range is either a single page
(e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are
numbered from 1.
The default is to convert all pages.
Text segmentation options¶
--word-segmentation=simpleUse the same word segmentation as found in the DjVu file.
This is the default.
--word-segmentation=uax29
Use the Unicode Text Segmentation[2] algorithm to
break lines into words, possibly fixing word segmentation found in the DjVu
file.
HTML output options¶
--title=titleSpecifies the document title.
The default is “DjVu hidden text layer”.
--css=style
Add the specified CSS style to the document.
For example, --css='.ocrx_line { display: block; }' can be used to visually preserve line breaks.
Other options¶
--versionOutput version information and exit.
-h, --help
Display help and exit.
PORTABILITY¶
djvu2hocr uses a custom extension to hOCR to retain characters which cannot be directly represented in an HTML/XML document. For example, control character BEL (^G, U+0007), is converted into the following HTML chunk: <span class="djvu_char" title="#x07"> </span>BUGS¶
Please report bugs at: https://github.com/jwilk/ocrodjvu/issuesSEE ALSO¶
djvu(1), hocr2djvused(1), ocrodjvu(1)NOTES¶
- 1.
- hOCR
- 2.
- Unicode Text Segmentation
2018-07-12 | djvu2hocr 0.10.4 |