NAME¶

dumppdf - dumps internal contents of a PDF files

SYNOPSIS¶

dumppdf [option...] file...

DESCRIPTION¶

dumppdf dumps the internal contents of a PDF file in pseudo-XML format. This program is primarily for debugging purposes, but it's also possible to extract some meaningful contents

OPTIONS¶

-a

Dump all the objects. By default only the document trailer is printed.

-i objno[,objno,...]

Specifies PDF object IDs to display. Comma-separated IDs, or multiple -i options are accepted.

-p pageno[,pageno,...]

Specifies the comma-separated list of the page numbers to be extracted. Page numbers start at one. By default, it extracts text from all the pages.

-r, -b, -t

Specifies the output format of stream contents. Because the contents of stream objects can be very large, they are omitted when none of the options above is specified.

With -r option, the “raw” stream contents are dumped without decompression. With -b option, the decompressed contents are dumped as a binary blob. With -t option, the decompressed contents are dumped in a text format, similar to repr() manner. When -r or -b option is given, no stream header is displayed for the ease of saving it to a file.

-T

Show the table of contents.

-P password

Provides the user password to access PDF contents.

-d

Increase the debug level.

EXAMPLES¶

Dump all the headers and contents, except stream objects:

$ dumppdf -a test.pdf

Dump the table of contents:

$ dumppdf -T test.pdf

Extract a JPEG image:

$ dumppdf -r -i6 test.pdf > image.jpeg

AUTHORS¶

Jakub Wilk <jwilk@debian.org>

Wrote this manual page for the Debian system.

Yusuke Shinyama <yusuke@cs.nyu.edu>

Author of PDFMiner and its original HTML documentation.

11/05/2015

dumppdf

Source file:	dumppdf.1.en.gz (from python-pdfminer 20140328+dfsg-1)
Source last updated:	2015-11-05T11:38:33Z
Converted to HTML:	2019-06-03T07:39:55Z