table of contents
| PDF2DJVU(1) | pdf2djvu manual | PDF2DJVU(1) | 
NAME¶
pdf2djvu - creates DjVu files from PDF filesSYNOPSIS¶
pdf2djvu
  [{ -o | --output} output-djvu-file]
  [option...] pdf-file...
pdf2djvu
  { -i | --indirect} index-djvu-file
  [option...] pdf-file...
pdf2djvu
  { --version | --help | -h}
DESCRIPTION¶
This program creates a DjVu file from one or more Portable Document Format files.OPTIONS¶
Document type, file names¶
-o, --output=output-djvu-fileGenerate a bundled multi-page document. Write
  the file into output-djvu-file instead of standard output.
-i, --indirect=index-djvu-file
Generate an indirect multi-page document. Use
  index-djvu-file as the index file name; put the component files into
  the same directory. The directory must exist and be writable.
--pageid-template=template
Specifies the naming scheme for page
  identifiers. Consult the “TEMPLATE LANGUAGE” section for the
  template language description.
 
The default template is “p{page:04*}.djvu”.
 
For portability reasons, page identifiers:
 
 
 
 
 
--pageid-prefix=prefix
•must consist only of lowercase ASCII
  letters, digits, _, +, - and dot,
•cannot start with a +, - or a
  dot,
•cannot contain two consecutive
  dots,
•must end with the .djvu or the .djv
  extension.
Equivalent to “--pageid-template=
  prefix{page:04*}.djvu”.
--page-title-template=template
Specifies the template for page titles.
  Consult the “TEMPLATE LANGUAGE” section for the template language
  description.
 
The default is to set no page titles.
Resolution, page size¶
-d, --dpi=resolutionSpecifies the desired resolution to
  resolution dots per inch. The default is 300 dpi. The allowed range is:
  72 ≤ resolution ≤ 6000.
--media-box
Use MediaBox to determine page size. CropBox
  is used by default.
--page-size=widthxheight
Specifies the preferred page size to
  width pixels × height pixels. The actual page size may be
  altered in order to respect aspect ratio and DjVu limitations on resolution.
  (This option takes precedence over -d/--dpi.)
--guess-dpi
Try to guess native resolution by inspecting
  embedded images. Use with care.
Image quality¶
--bg-slices=n+...+n, --bg-slices=n,...,nSpecifies the encoding quality of the IW44
  background layer. This option is similar to the -slice option of
  c44. Consult the c44(1) manual page for details. The default is
  72+11+10+10.
--bg-subsample=n
Specifies the background subsampling ratio.
  The default is 3. Valid values are integers between 1 and 12, inclusive.
--fg-colors=default
Try to preserve all the foreground layer
  colors. This is the default.
--fg-colors=web
Reduce foreground layer colors to the web
  palette (216 colors). This option is not recommended.
--fg-colors=n
Use GraphicsMagick to reduce number of
  distinct colors in the foreground layer to n. Valid values are integers
  between 1 and 4080. This option is not recommended.
--fg-colors=black
Discard any color information from the
  foreground layer.
--monochrome
Render pages as monochrome bitmaps. With this
  option, --bg-... and --fg-...
  options are not respected.
--loss-level=n
Specifies the aggressiveness of the lossy
  compression. The default is 0 (lossless). Valid values are integers between 0
  and 200, inclusive. This option is similar to the -losslevel option of
  cjb2; consult the cjb2(1) manual page for details. This option
  is respected only along with the --monochrome option.
--lossy
Synonym for --loss-level=100.
--anti-alias
Enable font and vector anti-aliasing. This
  option is not recommended.
Extraction¶
--no-metadataDon't extract the metadata.
 
By default:
 
 
 
 
 
Note
 
 
If multiple input documents are specified, only metadata of the first one is
  taken into account.
 
--verbatim-metadata
•The following entries of the document
  information dictionary are extracted: Title, Author, Subject, Creator,
  Producer, CreationDate, ModDate. Timestamps are formatted according to RFC
  3999[1], with date and time components separated by a single space.
•The XMP metadata is extracted (or
  created) and updated accordingly.
Keep the original metadata intact.
--no-outline
Don't extract the document outline.
--hyperlinks=border-avis
Make hyperlink borders always visible.
 
By default, a hyperlink border is visible only when the mouse is over the
  hyperlink.
--hyperlinks=#RRGGBB
Force the specified border color for
  hyperlinks.
--no-hyperlinks, --hyperlinks=none
Don't extract hyperlinks.
--no-text
Don't extract the text.
--words
Extract the text. Record the location of every
  word. This is the default.
--lines
Extract the text. Record the location of every
  line, rather that every word.
--crop-text
Extract no text outside the page
  boundary.
--no-nfkc
Don't NFKC[2]-normalize the text.
--filter-text=command-line
Filter the text through the
  command-line. The provided filter must preserve whitespace, control
  characters and decimal digits.
 
This option implies --no-nfkc.
-p, --pages=page-range
Specifies pages to convert. page-range
  is a comma-separated list of sub-ranges. Each sub-range is either a single
  page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages
  are numbered from 1.
 
The default is to convert all pages.
Performance¶
-j, --jobs=nUse n threads to perform conversion.
  The default is to use one thread.
-j0, --jobs=0
Determine automatically how many threads to
  use to perform conversion.
Verbosity, help¶
-v, --verboseDisplay more informational messages while
  converting the file.
-q, --quiet
Don't display informational messages while
  converting the file.
--version
Output version information and exit.
-h, --help
Display help and exit.
ENVIRONMENT¶
The following environment variables affects pdf2djvu on Unix systems: OMP_*Details of runtime behaviour with respect to
  parallelism can be controlled by several environment variables. Please refer
  to the OpenMP API specification[3] for details.
TMPDIR
TEMPLATE LANGUAGE¶
Template syntax¶
The template language is roughly modelled on the Python string formatting syntax[4]. A template is a piece of text which contains fields, surrounded by curly braces {}. Fields are replaced with appropriately formatted values when the template is evaluated. Moreover, {{ is replaced with a single { and }} is replaced with a single }.Field syntax¶
Each field consists of a variable name, optionally followed by a shift, optionally followed by a format specification. The shift is a signed (i.e. starting with a + or - character) integer. The format specification consists of a colon, followed by a width specification. The width specification is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. Preceding the width specification with a zero (0) character enables zero-padding. The width specification is optionally followed by an asterisk (*) character, which increases the minimum field width to the width of the longest possible content of the variable.Available variables¶
page, spagePage number in the PDF document.
dpage
Page number in the DjVu document.
IMPLEMENTATION DETAILS¶
Layer separation algorithm¶
Unless the --monochrome option is on, pdf2djvu uses the following naïve layer separation algorithm: 1.For each page, do the following:
 
 
 
 
 1.Raster the page into a pixmap, in the usual
  manner.
 2.Raster the page into another pixmap,
  omitting the following page elements:
 
 
 
 
•text,
•1 bit-per-pixel raster images,
•vector elements (except fills of large
  areas).
 3.Compare both pixmaps, pixel by pixel:
 
 
 
 1.If their colors match, classify the pixel
  as a part of the background layer.
 2.Otherwise, classify the pixel as a part of
  the foreground layer.
BUG REPORTS¶
If you find a bug in pdf2djvu, please report it at the issue tracker[5].SEE ALSO¶
AUTHOR¶
Jakub Wilk <jwilk@jwilk.net>Author.
NOTES¶
- 1.
 - RFC 3999
 
- 2.
 - NFKC
 
- 3.
 - OpenMP API specification
 
- 4.
 - Python string formatting syntax
 
- 5.
 - the issue tracker
 
| 01/22/2012 | pdf2djvu 0.7.12 |