table of contents
PDFGREP(1) | Pdfgrep Manual | PDFGREP(1) |
NAME¶
pdfgrep - search PDF files for a regular expression
SYNOPSIS¶
pdfgrep [OPTION...] PATTERN FILE... pdfgrep [OPTION...] {-e PATTERN|-f FILE}... FILE... pdfgrep [OPTION...] -r|-R PATTERN [FILE|DIR...] pdfgrep [OPTION...] -r|-R {-e PATTERN|-f FILE}... [FILE|DIR...]
DESCRIPTION¶
Search for PATTERN in each PDF FILE and print matching lines. By default, PATTERN is an extended regular expression.
pdfgrep tries to be mostly compatible with GNU grep with some PDF-specific distinctions and additional options. Most notably, -n prints page instead of line numbers.
OPTIONS¶
General Information¶
--help
-V, --version
Pattern Interpretation¶
-F, --fixed-strings
-P, --perl-regexp
Matching Control¶
-e PATTERN, --regexp=PATTERN
-f FILE, --file=FILE
-i, --ignore-case
General Output Control¶
-c, --count
-p, --page-count
--color WHEN
always | Always use colors, even when stdout is not a terminal. |
never | Do not use colors. |
auto | Use colors only when stdout is a terminal (this is the default). |
-L, --files-without-match
-l, --files-with-matches
-m, --max-count NUM
-o, --only-matching
-q, --quiet
Line Prefix Control¶
-H, --with-filename
-h, --no-filename
-n, --page-number[=TYPE]
The optional argument TYPE controls how page numbers are determined. If present, it can be one of the following:
index | The index of the page in the PDF, starting at 1 for the first page. This is the default. |
label | The PDF page label. This can be an arbitrary string per page, set by the PDF creator (e.g. roman numerals for the first few pages). |
-Z, --null
--match-prefix-separator SEP
Context Control¶
-A NUM, --after-context=NUM
-B NUM, --before-context=NUM
-C NUM, --context=NUM
File Selection¶
-r, --recursive
-R, --dereference-recursive
--exclude=GLOB
--include=GLOB
Other Options¶
--cache
--password=PASSWORD
--page-range=RANGE
--debug
--warn-empty
--unac
This option is experimental and only available if pdfgrep is compiled with unac support.
EXIT STATUS¶
Normally, the exit status is 0 if at least one match is found, 1 if no match is found and 2 if an error occurred. But if the --quiet or -q option is used and a match was found, pdfgrep will return 0 regardless of errors.
ENVIRONMENT VARIABLES¶
The behavior of pdfgrep is affected by the following environment variable.
GREP_COLORS
FILES¶
${XDG_CACHE_HOME}/pdfgrep/*
EXAMPLES¶
Print the first ten lines matching pattern and print their page number:
pdfgrep -n --max-count 10 pattern foo.pdf
Search all .pdf files whose names begin with foo recursively in the current directory:
pdfgrep -r --include "foo*.pdf" pattern
Search all PDFs in the current directory for foo that also contain bar:
pdfgrep -Z --files-with-matches "bar" *.pdf | xargs -0 pdfgrep -H foo
Search all .pdf files that are smaller than 12M recursively in the current directory:
find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern
Note that in contrast to the previous examples, this task could not be solved with pdfgrep alone, but the Unix tools find(1) and xargs(1) had to be used. That’s because pdfgrep itself doesn’t include options to exclude files by their size. But as you see, it doesn’t have to!
Search all .pdf files in the current directory in parallel on a multcore CPU
find . -name "*.pdf" -print0 | parallel -q0 pdfgrep -H foobar
This uses GNU parallel(1) in addition fo find(1) to search multiple files in parallel on multicore processors. Doing this can lead to a good speedup if you have multiple files to search and an underused CPU.
BUGS¶
Reporting Bugs¶
Bugs can either be reportet to the mailing list (pdfgrep-users@pdfgrep.org) or to the bugtracker on gitlab (https://gitlab.com/pdfgrep/pdfgrep/issues).
AUTHORS¶
pdfgrep is maintained by Hans-Peter Deifel.
See the AUTHORS file in the source for a full list of contributors.
SEE ALSO¶
See pdfgrep’s website https://pdfgrep.org for more information, downloads, git repository and more.
03/15/2024 | Pdfgrep 2.1.2 |