NAME¶
urifind - find URIs in a document and dump them to STDOUT.
SYNOPSIS¶
$ urifind file
DESCRIPTION¶
urifind is a simple script that finds URIs in one or more files (using
"URI::Find"), and outputs them to to STDOUT. That's it.
To find all the URIs in
file1, use:
$ urifind file1
To find the URIs in multiple files, simply list them as arguments:
$ urifind file1 file2 file3
urifind will read from "STDIN" if no files are given or if a
filename of "-" is specified:
$ wget http://www.boston.com/ -O - | urifind
When multiple files are listed,
urifind prefixes each found URI with the
file from which it came:
$ urifind file1 file2
file1: http://www.boston.com/index.html
file2: http://use.perl.org/
This can be turned on for single files with the "-p"
("prefix") switch:
$urifind -p file3
file1: http://fsck.com/rt/
It can also be turned off for multiple files with the "-n" ("no
prefix") switch:
$ urifind -n file1 file2
http://www.boston.com/index.html
http://use.perl.org/
By default, URIs will be displayed in the order found; to sort them
ascii-betically, use the "-s" ("sort") option. To reverse
sort them, use the "-r" ("reverse") flag ("-r"
implies "-s").
$ urifind -s file1 file2
http://use.perl.org/
http://www.boston.com/index.html
mailto:webmaster@boston.com
$ urifind -r file1 file2
mailto:webmaster@boston.com
http://www.boston.com/index.html
http://use.perl.org/
Finally,
urifind supports limiting the returned URIs by scheme or by
arbitrary pattern, using the "-S" option (for schemes) and the
"-P" option. Both "-S" and "-P" can be specified
multiple times:
$ urifind -S mailto file1
mailto:webmaster@boston.com
$ urifind -S mailto -S http file1
mailto:webmaster@boston.com
http://www.boston.com/index.html
"-P" takes an arbitrary Perl regex. It might need to be protected from
the shell:
$ urifind -P 's?html?' file1
http://www.boston.com/index.html
$ urifind -P '\.org\b' -S http file4
http://www.gnu.org/software/wget/wget.html
Add a "-d" to have
urifind dump the refexen generated from
"-S" and "-P" to "STDERR". "-D" does
the same but exits immediately:
$ urifind -P '\.org\b' -S http -D
$scheme = '^(\bhttp\b):'
@pats = ('^(\bhttp\b):', '\.org\b')
To remove duplicates from the results, use the "-u"
("unique") switch.
OPTION SUMMARY¶
- -s
- Sort results.
- -r
- Reverse sort results (implies -s).
- -u
- Return unique results only.
- -n
- Don't include filename in output.
- -p
- Include filename in output (0 by default, but 1 if multiple
files are included on the command line).
- -P $re
- Print only lines matching regex '$re' (may be specified
multiple times).
- -S $scheme
- Only this scheme (may be specified multiple times).
- -h
- Help summary.
- -v
- Display version and exit.
- -d
- Dump compiled regexes for "-S" and "-P"
to "STDERR".
- -D
- Same as "-d", but exit after dumping.
AUTHOR¶
darren chamberlain <darren@cpan.org>
COPYRIGHT¶
(C) 2003 darren chamberlain
This library is free software; you may distribute it and/or modify it under the
same terms as Perl itself.
SEE ALSO¶
URI::Find