htmldoc(1) | Michael R Sweet | htmldoc(1) |
NAME¶
htmldoc - convert html source files into html, postscript, or pdf.
SYNOPSIS¶
htmldoc [options] filename1.{html,md} [ ... filenameN.{html,md} ]
htmldoc [options] -
htmldoc [filename.book]
DESCRIPTION¶
Htmldoc(1) converts HTML and Markdown source files into indexed HTML, PostScript, or Portable Document Format (PDF) files that can be viewed online or printed. With no options a HTML document is produced on stdout.
The second form of htmldoc reads HTML source from stdin, which allows you to use htmldoc as a filter.
The third form of htmldoc launches a graphical interface that allows you to change options and generate documents interactively.
COMMON MISTAKES¶
There are two types of HTML files - structured documents using headings (H1, H2, etc.) which htmldoc calls "books", and unstructured documents that do not use headings which htmldoc calls "web pages".
A very common mistake is to try converting a web page using:
which will likely produce a PDF file with no pages. To convert web page files you must use the --webpage or --continuous options at the command-line or choose Web Page or Continuous in the input tab of the GUI.
htmldoc -f filename.pdf filename.html
OPTIONS¶
The following command-line options are supported by htmldoc:
- --batch filename.book
- Generates the specified book file without opening the GUI.
- --bodycolor color
- Specifies the background color for all pages.
- --bodyfont {courier,helvetica,monospace,sans,serif,times}
- --textfont {courier,helvetica,monospace,sans,serif,times}
- Specifies the default typeface for all normal text.
- --bodyimage filename
- Specifies the background image that is tiled on all pages.
- --book
- Specifies that the HTML sources are structured (headings, chapters, etc.)
- --bottom margin
- Specifies the bottom margin in points (no suffix or ##pt), inches (##in), centimeters (##cm), or millimeters (##mm).
- --charset {cp-nnnn,iso-8859-1,...,iso-8859-15,utf-8}
- Specifies the character set to use for the output. Note: UTF-8 support is limited to the first 128 Unicode characters that are found in the input.
- --color
- Specifies that PostScript or PDF output should be in color.
- --continuous
- Specifies that the HTML sources are unstructured (plain web pages.) No page breaks are inserted between each file or URL in the output.
- --datadir directory
- Specifies the location of the htmldoc data files, usually /usr/share/htmldoc or C:/Program Files/HTMLDOC.
- --duplex
- Specifies that the output should be formatted for double-sided printing.
- --effectduration {0.1...10.0}
- Specifies the duration in seconds of PDF page transition effects.
- --embedfonts
- Specifies that fonts should be embedded in PDF and PostScript output.
- --encryption
- Enables encryption of PDF files.
- --fontsize size
- Specifies the default font size for body text.
- --fontspacing spacing
- Specifies the default line spacing for body text. The line spacing is a multiplier for the font size, so a value of 1.2 will provide an additional 20% of space between the lines.
- Sets the page footer to use on body pages. See the HEADERS/FOOTERS FORMATS section below.
- --format format
- -t format
- Specifies the output format: epub, html, htmlsep (separate HTML files for each heading in the table-of-contents), ps or ps2 (PostScript Level 2), ps1 (PostScript Level 1), ps3 (PostScript Level 3), pdf11 (PDF 1.1/Acrobat 2.0), pdf12 (PDF 1.2/Acrobat 3.0), pdf or pdf13 (PDF 1.3/Acrobat 4.0), or pdf14 (PDF 1.4/Acrobat 5.0).
- --gray
- Specifies that PostScript or PDF output should be grayscale.
- --header fff
- Sets the page header to use on body pages. See the HEADERS/FOOTERS FORMATS section below.
- --header1 fff
- Sets the page header to use on the first body/chapter page. See the HEADERS/FOOTERS FORMATS section below.
- --headfootfont font
- Sets the font to use on headers and footers.
- --headfootsize size
- Sets the size of the font to use on headers and footers.
- --headingfont typeface
- Sets the typeface to use for headings.
- --help
- Displays a summary of command-line options.
- --helpdir directory
- Specifies the location of the htmldoc online help files, usually /usr/share/doc/htmldoc or C:/Program Files/HTMLDOC/DOC.
- --hfimageN filename
- Specifies an image (numbered from 1 to 10) to be used in the header or footer in a PostScript or PDF document.
- --jpeg[=quality]
- Sets the JPEG compression level to use for large images. A value of 0 disables JPEG compression.
- --left margin
- Specifies the left margin in points (no suffix or ##pt), inches (##in), centimeters (##cm), or millimeters (##mm).
- --letterhead filename
- Specifies an image to be used as a letterhead in the header or footer in a PostScript or PDF document. Note that you need to use the --header, --header1, and/or --footer options with the L parameter or use the corresponding HTML page comments to display the letterhead image in the header or footer.
- --linkcolor color
- Sets the color of links.
- --links
- Enables generation of links in PDF files (default).
- --linkstyle {plain,underline}
- Sets the style of links.
- --logoimage filename
- Specifies an image to be used as a logo in the header or footer in a PostScript or PDF document, and in the navigation bar of a HTML document. Note that you need to use the --header, --header1, and/or --footer options with the l parameter or use the corresponding HTML page comments to display the logo image in the header or footer.
- --no-compression
- Disables compression of PostScript or PDF files.
- --no-duplex
- Disables double-sided printing.
- --no-embedfonts
-
Specifies that fonts should not be embedded in PDF and PostScript output. - --no-encryption
- Disables document encryption.
- --no-jpeg
- Disables JPEG compression of large images.
- --no-links
- Disables generation of links in a PDF document.
- --no-numbered
- Disables automatic heading numbering.
- --no-pscommands
- Disables generation of PostScript setpagedevice commands.
- --no-strict
- Disables strict HTML input checking.
- --no-title
- Disables generation of a title page.
- --no-toc
- Disables generation of a table of contents.
- --numbered
- Numbers all headings in a document.
- --nup pages
- Sets the number of pages that are placed on each output page. Valid values are 1, 2, 4, 6, 9, and 16.
- --outdir directory
- -d directory
- Specifies that output should be sent to a directory in multiple files. (Not compatible with PDF output)
- --outfile filename
- -f filename
- Specifies that output should be sent to a single file.
- --owner-password password
- Sets the owner password for encrypted PDF files.
- --pageduration I{1.0...60.0}
- Sets the view duration of a page in a PDF document.
- --pageeffect effect
- Specifies the page transition effect for all pages; this attribute is ignored by all Adobe PDF viewers.
- --pagelayout {single,one,twoleft,tworight}
- Specifies the initial layout of pages for a PDF file.
- --pagemode {document,outlines,fullscreen}
- Specifies the initial viewing mode for a PDF file.
- --path
- Specifies a search path for files in a document.
- --permissions permission[,permission,...]
- Specifies document permissions for encrypted PDF files. The following permissions are understood: all, none, annotate, no-annotate, copy, no-copy, modify, no-modify, print, and no-print. Separate multiple permissions with commas.
- --pre-indent distance
- Specifies the indentation of pre-formatted text in points (no suffix or ##pt), inches (##in), centimeters (##cm), or millimeters (##mm).
- --pscommands
- Specifies that PostScript setpagedevice commands should be included in the output.
- --quiet
- Suppresses all messages, even error messages.
- --referer url
- Specifies the URL that is passed in the Referer: field of HTTP requests.
- --right margin
- Specifies the right margin in points (no suffix or ##pt), inches (##in), centimeters (##cm), or millimeters (##mm).
- --size pagesize
- Specifies the page size using a standard name or in points (no suffix or ##x##pt), inches (##x##in), centimeters (##x##cm), or millimeters (##x##mm). The standard sizes that are currently recognized are "letter" (8.5x11in), "legal" (8.5x14in), "a4" (210x297mm), and "universal" (8.27x11in).
- --strict
- Enables strict HTML input checking.
- --textcolor color
- Specifies the default color of all text.
- --title
- Enables the generation of a title page.
- --titlefile filename
- --titleimage filename
- Specifies the file to use for the title page. If the file is an image then the title page is automatically generated using the document meta data and title image.
- Sets the page footer to use on table-of-contents pages. See the HEADERS/FOOTERS FORMATS section below.
- --tocheader fff
- Sets the page header to use on table-of-contents pages. See the HEADERS/FOOTERS FORMATS section below.
- --toclevels levels
- Sets the number of levels in the table-of-contents.
- --toctitle string
- Sets the title for the table-of-contents.
- --top margin
- Specifies the top margin in points (no suffix or ##pt), inches (##in), centimeters (##cm), or millimeters (##mm).
- --user-password password
- Specifies the user password for encryption of PDF files.
- --verbose
- -v
- Provides verbose messages.
- --version
- Displays the current version number.
- --webpage
- Specifies that the HTML sources are unstructured (plain web pages.) A page break is inserted between each file or URL in the output.
EXIT STATUS¶
Htmldoc returns a non-zero exit status if any errors are seen, zero otherwise.
HEADER/FOOTER FORMATS¶
The header and footer of each page can contain up to three preformatted values. These values are specified using a single character for the left, middle, and right of the page, resulting in the fff notation shown previously.
Each character can be one of the following:
- .
- blank
- /
- n/N arabic page numbers (1/3, 2/3, 3/3)
- :
- c/C arabic chapter page numbers (1/2, 2/2, 1/4, 2/4, ...)
- 1
- arabic numbers (1, 2, 3, ...)
- a
- lowercase letters
- A
- uppercase letters
- c
- current chapter heading
- C
- current chapter page number (arabic)
- d
- current date
- D
- current date and time
- h
- current heading
- i
- lowercase roman numerals
- I
- uppercase roman numerals
- l
- logo image
- L
- logo image as letterhead - the image is inserted at its maximum size
- t
- title text
- T
- current time
- u
- current filename or URL
ENVIRONMENT¶
HTMLDOC looks for several environment variables which can override the default directories, display additional debugging information, and disable CGI mode:
- HTMLDOC_DATA
- This environment variable specifies the location of htmldoc's data and fonts directories, normally /usr/share/htmldoc or C:/Program Files/HTMLDOC.
- HTMLDOC_DEBUG
- This environment variable enables debugging information that is sent to stderr. The value is a list of any of the following keywords separated by spaces: "all", "links", "memory", "remotebytes", "table", "tempfiles", and/or "timing".
- HTMLDOC_HELP
- This environment variable specifies the location of htmldoc's documentation directory, normally /usr/share/doc/htmldoc or C:/Program Files/HTMLDOC/doc.
- HTMLDOC_NOCGI
- This environment variable, when set (the value doesn't matter), disables CGI mode. It is most useful for using htmldoc on a web server from a scripting language or invocation from a program.
EXAMPLES¶
Create a PDF file from a web site:
htmldoc --webpage -f example.pdf http://www.example.com/
Create a PostScript book from a directory of HTML files
htmldoc --book -f example.pdf *.html
SEE ALSO¶
HTMLDOC Users Manual
AUTHOR¶
Michael R Sweet
LEGAL STUFF¶
HTMLDOC is copyright © 1997-2023 by Michael R Sweet.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
HTMLDOC 1.9.17 | 2023-09-13 |