NAME¶
hxextract - extract selected elements from a HTML or XML file
SYNOPSIS¶
hxextract [
-h |
-? ] [
-x ] [
-s text
] [
-e text ] [
-b base ]
element-or-class
[
-c configfile |
file-or-URL ]
DESCRIPTION¶
hxextract outputs all elements with a certain name and/or class.
Input must be well-formed, since no HTML heuristics are applied.
OPTIONS¶
The following options are supported:
- -x
- Use XML format conventions.
- -s text
- Insert text at the start of the output.
- -e text
- Insert text at the end of the output.
- -b base
- URL base
- -c configfile
- Read @chapter lines from configfile (lines must be
of the form "@chapter filename") and extract elements from each
of those files.
- -h, -?
- Print command usage.
OPERANDS¶
The following operands are supported:
- element-or-class
- The name of an element to extract (e.g., "H2"),
or the name of a class preceded by "." (e.g.,
".example") or a combination of both (e.g.,
"H2.example").
- file-or-URL
- A file name or a URL. To read from standard input, use
"-".
ENVIRONMENT¶
To use a proxy to retrieve remote files, set the environment variables
http_proxy and
ftp_proxy. E.g.,
http_proxy="http://localhost:8080/"
BUGS¶
Remote files (specified with a URL) are currently only supported for HTTP.
Password-protected files or files that depend on HTTP "cookies" are
not handled. (You can use tools such as
curl(1) or
wget(1) to
retrieve such files.)
SEE ALSO¶
hxselect(1)