NAME¶
pwget - Perl Web URL fetch program
SYNOPSIS¶
pwget http://example.com/ [URL ...]
pwget --config $HOME/config/pwget.conf --Tag linux --Tag emacs ..
pwget --verbose --overwrite http://example.com/
pwget --verbose --overwrite --Output ~/dir/ http://example.com/
pwget --new --overwrite http://example.com/package-1.1.tar.gz
DESCRIPTION¶
Automate periodic downloads of files and packages.
If you retrieve latest versions of certain program blocks periodically, this is
the Perl script for you. Run from cron job or once a week to upload newest
versions of files around the net. Note:
Wget and this program¶
At this point you may wonder, where would you need this perl program when
wget(1) C-program has been the standard for ages. Well, 1) Perl is
cross platform and more easily extendable 2) You can record file download
criterias to a configuration file and use perl regular epxressions to select
downloads 3) the program can anlyze web-pages and "search" for the
download only links as instructed 4) last but not least, it can track newest
packages whose name has changed since last downlaod. There are heuristics to
determine the newest file or package according to file name skeleton defined
in configuration.
This program does not replace
pwget(1) because it does not offer as many
options as wget, like recursive downloads and date comparing. Use wget for ad
hoc downloads and this utility for files that change (new releases of
archives) or which you monitor periodically.
Short introduction¶
This small utility makes it possible to keep a list of URLs in a configuration
file and periodically retrieve those pages or files with simple commands. This
utility is best suited for small batch jobs to download e.g. most recent
versions of software files. If you use an URL that is already on disk, be sure
to supply option
--overwrite to allow overwriting existing files.
While you can run this program from command line to retrieve individual files,
program has been designed to use separate configuration file via
--config option. In the configuration file you can control the
downloading with separate directives like "save:" which tells to
save the file under different name. The simplest way to retreive the latest
version of apackage from a FTP site is:
pwget --new --overwite --verbose \
http://www.example.com/package-1.00.tar.gz
Do not worry about the filename "package-1.00.tar.gz". The latest
version, say, "package-3.08.tar.gz" will be retrieved. The option
--new instructs to find newer version than the provided URL.
If the URL ends to slash, then directory list at the remote machine is stored to
file:
!path!000root-file
The content of this file can be either index.html or the directory listing
depending on the used http or ftp protocol.
OPTIONS¶
- -A, --regexp-content REGEXP
- Analyze the content of the file and match REGEXP. Only if
the regexp matches the file content, then download file. This option will
make downloads slow, because the file is read into memory as a single line
and then a match is searched against the content.
For example to download Emacs lisp file (.el) written by Mr. Foo in case
insensitive manner:
pwget -v -R '\.el$' -A "(?i)Author: Mr. Foo" \
http://www.emacswiki.org/elisp/index.html
- -C, --create-paths
- Create paths that do not exist in "lcd:"
directives.
By default, any LCD directive to non-existing directory will interrupt
program. With this option, local directories are created as needed making
it possible to re-create the exact structure as it is in configuration
file.
- -c, --config FILE
- This option can be given multiple times. All configurations
are read.
Read URLs from configuration file. If no configuration file is given, file
pointed by environment variable is read. See ENVIRONMENT.
The configuration file layout is envlained in section CONFIGURATION
FILE
- --chdir DIRECTORY
- Do a chdir() to DIRECTORY before any URL download
starts. This is like doing:
cd DIRECTORY
pwget http://example.com/index.html
- -d, --debug [LEVEL]
- Turn on debug with positive LEVEL number. Zero means no
debug. This option turns on --verbose too.
- -e, --extract
- Unpack any files after retrieving them. The command to
unpack typical archive files are defined in a program. Make sure these
programs are along path. Win32 users are encouraged to install the Cygwin
utilities where these programs come standard. Refer to section SEE ALSO.
.tar => tar
.tgz => tar + gzip
.gz => gzip
.bz2 => bzip2
.zip => unzip
- -F, --firewall FIREWALL
- Use FIREWALL when accessing files via ftp:// protocol.
- -h, --help
- Print help page in text.
- --help-html
- Print help page in HTML.
- --help-man
- Print help page in Unix manual page format. You want to
feed this output to c<nroff -man> in order to read it.
Print help page.
- -m, --mirror SITE
- If URL points to Sourcefoge download area, use mirror SITE
for downloading. Alternatively the full full URL can include the mirror
information. And example:
--mirror kent http://downloads.sourceforge.net/foo/foo-1.0.0.tar.gz
- -n, --new
- Get newest file. This applies to datafiles, which do not
have extension .asp or .html. When new releases are announced, the version
number in filename usually tells which is the current one so getting
harcoded file with:
pwget -o -v http://example.com/dir/program-1.3.tar.gz
is not usually practical from automation point of view. Adding --new
option to the command line causes double pass: a) the whole
http://example.com/dir/ is examined for all files and b) files matching
approximately filename program-1.3.tar.gz are examined, heuristically
sorted and file with latest version number is retrieved.
- --no-lcd
- Ignore "lcd:" directives in configuration file.
In the configuration file, any "lcd:" directives are obeyed as
they are seen. But if you do want to retrieve URL to your current
directory, be sure to supply this option. Otherwise the file will end to
the directory pointer by "lcd:".
- --no-save
- Ignore "save:" directives in configuration file.
If the URLs have "save:" options, they are ignored during fetch.
You usually want to combine --no-lcd with --no-save
- --no-extract
- Ignore "x:" directives in configuration
file.
- -O, --output DIR
- Before retrieving any files, chdir to DIR.
- -o, --overwrite
- Allow overwriting existing files when retrieving URLs.
Combine this with --skip-version if you periodically update
files.
- --proxy PROXY
- Use PROXY server for HTTP. (See --Firewall for
FTP.). The port number is optional in the call:
--proxy http://example.com.proxy.com
--proxy example.com.proxy.com:8080
- -p, --prefix PREFIX
- Add PREFIX to all retrieved files.
- -P, --postfix POSTFIX
- Add POSTFIX to all retrieved files.
- -D, --prefix-date
- Add iso8601 ":YYYY-MM-DD" prefix to all retrived
files. This is added before possible --prefix-www or
--prefix.
- -W, --prefix-www
- Usually the files are stored with the same name as in the
URL dir, but if you retrieve files that have identical names you can store
each page separately so that the file name is prefixed by the site name.
http://example.com/page.html --> example.com::page.html
http://example2.com/page.html --> example2.com::page.html
- -r, --regexp REGEXP
- Retrieve file matching at the destination URL site. This is
like "Connect to the URL and get all files matching REGEXP".
Here all gzip compressed files are found form HTTP server directory:
pwget -v -R "\.gz" http://example.com/archive/
- -R, --config-regexp REGEXP
- Retrieve URLs matching REGEXP from configuration file. This
cancels --Tag options in the command line.
- -s, --selftest
- Run some internal tests. For maintainer or developer
only.
- --sleep SECONDS
- Sleep SECONDS before next URL request. When using regexp
based downlaods that may return many hits, some sites disallow successive
requests in within short period of time. This options makes program sleep
for number of SECONDS between retrievals to overcome 'Service
unavailable'.
- --stdout
- Retrieve URL and write to stdout.
- --skip-version
- Do not download files that have version number and which
already exists on disk. Suppose you have these files and you use option
--skip-version:
package.tar.gz
file-1.1.tar.gz
Only file.txt is retrieved, because file-1.1.tar.gz contains version number
and the file has not changed since last retrieval. The idea is, that in
every release the number in in distribution increases, but there may be
distributions which do not contain version number. In regular intervals
you may want to load those packages again, but skip versioned files. In
short: This option does not make much sense without additional option
--new
If you want to reload versioned file again, add option
--overwrite.
- -t, --test, --dry-run
- Run in test mode.
- -T, --tag NAME [NAME] ...
- Search tag NAME from the config file and download only
entries defined under that tag. Refer to --config FILE option
description. You can give Multiple --Tag switches. Combining this
option with --regexp does not make sense and the concequencies are
undefined.
- -v, --verbose [NUMBER]
- Print verbose messages.
- -V, --version
- Print version information.
EXAMPLES¶
Get files from site:
pwget http://www.example.com/dir/package.tar.gz ..
Display copyright file for package GNU make from Debian pages:
pwget --stdout --regexp 'copyright$' http://packages.debian.org/unstable/make
Get all mailing list archive files that match "gz":
pwget --regexp gz http://example.com/mailing-list/archive/download/
Read a directory and store it to filename YYYY-MM-DD::!dir!000root-file.
pwget --prefix-date --overwrite --verbose http://www.example.com/dir/
To update newest version of the package, but only if there is none at disk
already. The
--new option instructs to find newer packages and the
filename is only used as a skeleton for files to look for:
pwget --overwrite --skip-version --new --verbose \
ftp://ftp.example.com/dir/packet-1.23.tar.gz
To overwrite file and add a date prefix to the file name:
pwget --prefix-date --overwrite --verbose \
http://www.example.com/file.pl
--> YYYY-MM-DD::file.pl
To add date and WWW site prefix to the filenames:
pwget --prefix-date --prefix-www --overwrite --verbose \
http://www.example.com/file.pl
--> YYYY-MM-DD::www.example.com::file.pl
Get all updated files under cnfiguration file's tag updates:
pwget --verbose --overwrite --skip-version --new --Tag updates
pwget -v -o -s -n -T updates
Get files as they read in the configuration file to the current directory,
ignoring any "lcd:" and "save:" directives:
pwget --config $HOME/config/pwget.conf /
--no-lcd --no-save --overwrite --verbose \
http://www.example.com/file.pl
To check configuration file, run the program with non-matching regexp and it
parses the file and checks the "lcd:" directives on the way:
pwget -v -r dummy-regexp
-->
pwget.DirectiveLcd: LCD [$EUSR/directory ...]
is not a directory at /users/foo/bin/pwget line 889.
CONFIGURATION FILE¶
The configuration file is NOT Perl code. Comments start with hash character (#).
Variables¶
At this point, variable expansions happen only in
lcd:. Do not try to use
them anywhere else, like in URLs.
Path variables for
lcd: are defined using following notation, spaces are
not allowed in VALUE part (no directory names with spaces). Varaible names are
case sensitive. Variables substitute environment variabales with the same
name. Environment variables are immediately available.
VARIABLE = /home/my/dir # define variable
VARIABLE = $dir/some/file # Use previously defined variable
FTP = $HOME/ftp # Use environment variable
The right hand can refer to previously defined variables or existing environment
variables. Repeat, this is not Perl code although it may look like one, but
just an allowed syntax in the configuration file. Notice that there is dollar
to the right hand> when variable is referred, but no dollar to the left
hand side when variable is defined. Here is example of a possible
configuration file contant. The tags are hierarchically ordered without a
limit.
Warning: remember to use different variables names in separate include files.
All variables are global.
Include files¶
It is possible to include more configuration files with statement
INCLUDE <path-to-file-name>
Variable expansions are possible in the file name. There is no limit how many or
how deep include structure is used. Every file is included only once, so it is
safe to to have multiple includes to the same file. Every include is read, so
put the most importat override includes last:
INCLUDE <etc/pwget.conf> # Global
INCLUDE <$HOME/config/pwget.conf> # HOME overrides it
A special "THIS" tag means relative path of the current include file,
which makes it possible to include several files form the same directory where
a initial include file resides
# Start of config at /etc/pwget.conf
# THIS = /etc, current location
include <THIS/pwget-others.conf>
# Refers to directory where current user is: the pwd
include <pwget-others.conf>
# end
Configuraton file example¶
The configuration file can contain many <directoves:>, where each
directive end to a colon. The usage of each directory is best explained by
examining the configuration file below and reading the commentary near each
directive.
# $HOME/config/pwget.conf F- Perl pwget configuration file
ROOT = $HOME # define variables
CONF = $HOME/config
UPDATE = $ROOT/updates
DOWNL = $ROOT/download
# Include more configuration files. It is possible to
# split a huge file in pieces and have "linux",
# "win32", "debian", "emacs" configurations in separate
# and manageable files.
INCLUDE <$CONF/pwget-other.conf>
INCLUDE <$CONF/pwget-more.conf>
tag1: local-copies tag1: local # multiple names to this category
lcd: $UPDATE # chdir directive
# This is show to user with option --verbose
print: Notice, this site moved YYYY-MM-DD, update your bookmarks
file://absolute/dir/file-1.23.tar.gz
tag1: external
lcd: $DOWNL
tag2: external-http
http://www.example.com/page.html
http://www.example.com/page.html save:/dir/dir/page.html
tag2: external-ftp
ftp://ftp.com/dir/file.txt.gz save:xx-file.txt.gz login:foo pass:passwd x:
lcd: $HOME/download/package
ftp://ftp.com/dir/package-1.1.tar.gz new:
tag2: package-x
lcd: $DOWNL/package-x
# Person announces new files in his homepage, download all
# announced files. Unpack everything (x:) and remove any
# existing directories (xopt:rm)
http://example.com/~foo pregexp:\.tar\.gz$ x: xopt:rm
# End of configuration file pwget.conf
LIST OF DIRECTIVES IN CONFIGURATION FILE¶
All the directives must in the same line where the URL is. The programs scans
lines and determines all options given in line for the URL. Directives can be
overridden by command line options.
- cnv:CONVERSION
- Currently only conv:text is available.
Convert downloaded page to text. This option always needs either
save: or rename:, because only those directives change
filename. Here is an example:
http://example.com/dir/file.html cnv:text save:file.txt
http://example.com/dir/ pregexp:\.html cnv:text rename:s/html/txt/
A text: shorthand directive can be used instead of
cnv:text.
- cregexp:REGEXP
- Download file only if the content matches REGEXP. This is
same as option --Regexp-content. In this example directory listing
Emacs lisp packages (.el) are downloaded but only if their content
indicates that the Author is Mr. Foo:
http://example.com/index.html cregexp:(?i)author:.*Foo pregexp:\.el$
- lcd:DIRECTORY
- Set local download directory to DIRECTORY (chdir to it).
Any environment variables are substituted in path name. If this tag is
found, it replaces setting of --Output. If path is not a directory,
terminate with error. See also --Create-paths and
--no-lcd.
- login:LOGIN-NAME
- Ftp login name. Default value is
"anonymous".
- mirror:SITE
- This is relevant to Sourceforge only which does not allow
direct downloads with links. Visit project's Sourceforge homepage and see
which mirrors are available for downloading.
An example:
http://sourceforge.net/projects/austrumi/files/austrumi/austrumi-1.8.5/austrumi-1.8.5.iso/download new: mirror:kent
- new:
- Get newest file. This variable is reset to the value of
--new after the line has been processed. Newest means, that an
"ls" command is run in the ftp, and something equivalent in HTTP
"ftp directories", and any files that resemble the filename is
examined, sorted and heurestically determined according to version number
of file which one is the latest. For example files that have version
information in YYYYMMDD format will most likely to be retrieved right.
Time stamps of the files are not checked.
The only requirement is that filename "must" follow the universal
version numbering standard:
FILE-VERSION.extension # de facto VERSION is defined as [\d.]+
file-19990101.tar.gz # ok
file-1999.0101.tar.gz # ok
file-1.2.3.5.tar.gz # ok
file1234.txt # not recognized. Must have "-"
file-0.23d.tar.gz # warning, letters are problematic
Files that have some alphabetic version indicator at the end of VERSION may
not be handled correctly. Contact the developer and inform him about the
de facto standard so that files can be retrieved more intelligently.
NOTE: In order the new: directive to know what kind of files
to look for, it needs a file tamplate. You can use a direct link to some
filename. Here the location "http://www.example.com/downloads"
is examined and the filename template used is took as
"file-1.1.tar.gz" to search for files that might be newer, like
"file-9.1.10.tar.gz":
http://www.example.com/downloads/file-1.1.tar.gz new:
If the filename appeard in a named page, use directive file: for
template. In this case the "download.html" page is examined for
files looking like "file.*tar.gz" and the latest is searched:
http://www.example.com/project/download.html file:file-1.1.tar.gz new:
- overwrite: o:
- Same as turning on --overwrite
- page:
- Read web page and apply commands to it. An example: contact
the root page and save it:
http://example.com/~foo page: save:foo-homepage.html
In order to find the correct information from the page, other directives are
usually supplied to guide the searching.
1) Adding directive "pregexp:ARCHIVE-REGEXP" matches the A HREF
links in the page.
2) Adding directive new: instructs to find newer VERSIONS of the
file.
3) Adding directive "file:DOWNLOAD-FILE" tells what template to
use to construct the downloadable file name. This is needed for the
"new:" directive.
4) A directive "vregexp:VERSION-REGEXP" matches the exact location
in the page from where the version information is extracted. The default
regexp looks for line that says "The latest version ... is ...
N.N". The regexp must return submatch 2 for the version number.
AN EXAMPLE
Search for newer files from a HTTP directory listing. Examine page
http://www.example.com/download/dir for model
"package-1.1.tar.gz" and find a newer file. E.g.
"package-4.7.tar.gz" would be downloaded.
http://www.example.com/download/dir/package-1.1.tar.gz new:
AN EXAMPLE
Search for newer files from the content of the page. The directive
file: acts as a model for filenames to pay attention to.
http://www.example.com/project/download.html new: pregexp:tar.gz file:package-1.1.tar.gz
AN EXAMPLE
Use directive rename: to change the filename before soring it on
disk. Here, the version number is attached to the actila filename:
file.txt-1.1
file.txt-1.2
The directived needed would be as follows; entries have been broken to
separate lines for legibility:
http://example.com/files/
pregexp:\.el-\d
vregexp:(file.el-([\d.]+))
file:file.el-1.1
new:
rename:s/-[\d.]+//
This effectively reads: "See if there is new version of something that
looks like file.el-1.1 and save it under name file.el by deleting the
extra version number at the end of original filename".
AN EXAMPLE
Contact absolute page: at http://www.example.com/package.html and
search A HREF urls in the page that match pregexp:. In addition, do
another scan and search the version number in the page from thw position
that match vregexp: (submatch 2).
After all the pieces have been found, use template file: to make the
retrievable file using the version number found from vregexp:. The
actual download location is combination of page: and A HREF
pregexp: location.
The directived needed would be as follows; entries have been broken to
separate lines for legibility:
http://www.example.com/~foo/package.html
page:
pregexp: package.tar.gz
vregexp: ((?i)latest.*?version.*?\b([\d][\d.]+).*)
file: package-1.3.tar.gz
new:
x:
An example of web page where the above would apply:
<HTML>
<BODY>
The latest version of package is <B>2.4.1</B> It can be
downloaded in several forms:
<A HREF="download/files/package.tar.gz">Tar file</A>
<A HREF="download/files/package.zip">ZIP file
</BODY>
</HTML>
For this example, assume that "package.tar.gz" is a symbolic link
pointing to the latest release file "package-2.4.1.tar.gz". Thus
the actual download location would have been
"http://www.example.com/~foo/download/files/package-2.4.1.tar.gz".
Why not simply download "package.tar.gz"? Because then the program
can't decide if the version at the page is newer than one stored on disk
from the previous download. With version numbers in the file names, the
comparison is possible.
- page:find
- FIXME: This opton is obsolete. do not use.
THIS IS FOR HTTP only. Use Use directive regexp: for FTP protocls.
This is a more general instruction than the page: and vregexp:
explained above.
Instruct to download every URL on HTML page matching pregexp:RE. In
typical situation the page maintainer lists his software in the
development page. This example would download every tar.gz file in the
page. Note, that the REGEXP is matched against the A HREF link content,
not the actual text that is displayed on the page:
http://www.example.com/index.html page:find pregexp:\.tar.gz$
You can also use additional regexp-no: directive if you want to
exclude files after the pregexp: has matched a link.
http://www.example.com/index.html page:find pregexp:\.tar.gz$ regexp-no:desktop
- pass:PASSWORD
- For FTP logins. Default value is
"nobody@example.com".
- pregexp:RE
- Search A HREF links in page matching a regular expression.
The regular expression must be a single word with no whitespace. This is
incorrect:
pregexp:(this regexp )
It must be written as:
pregexp:(this\s+regexp\s)
- print:MESSAGE
- Print associated message to user requesting matching tag
name. This directive must in separate line inside tag.
tag1: linux
print: this download site moved 2002-02-02, check your bookmarks.
http://new.site.com/dir/file-1.1.tar.gz new:
The "print:" directive for tag is shown only if user turns on
--verbose mode:
pwget -v -T linux
- rename:PERL-CODE
- Rename each file using PERL-CODE. The PERL-CODE must be
full perl program with no spaces anywhere. Following variables are
available during the eval() of code:
$ARG = current file name
$url = complete url for the file
The code must return $ARG which is used for file name
For example, if page contains links to .html files that are in fact text
files, following statement would change the file extensions:
http://example.com/dir/ page:find pregexp:\.html rename:s/html/txt/
You can also call function "MonthToNumber($string)" if the
filename contains written month name, like <2005-February.mbox>.The
function will convert the name into number. Many mailing list archives can
be donwloaded cleanly this way.
# This will download SA-Exim Mailing list archives:
http://lists.merlins.org/archives/sa-exim/ pregexp:\.txt$ rename:$ARG=MonthToNumber($ARG)
Here is a more complicated example:
http://www.contactor.se/~dast/svnusers/mbox.cgi pregexp:mbox.*\d$ rename:my($y,$m)=($url=~/year=(\d+).*month=(\d+)/);$ARG="$y-$m.mbox"
Let's break that one apart. You may spend some time with this example since
the possiblilities are limitless.
1. Connect to page
http://www.contactor.se/~dast/svnusers/mbox.cgi
2. Search page for URLs matching regexp 'mbox.*\d$'. A
found link could match hrefs like this:
http://svn.haxx.se/users/mbox.cgi?year=2004&month=12
3. The found link is put to $ARG (same as $_), which can be used
to extract suitable mailbox name with a perl code that is
evaluated. The resulting name must apear in $ARG. Thus the code
effectively extract two items from the link to form a mailbox
name:
my ($y, $m) = ( $url =~ /year=(\d+).*month=(\d+)/ )
$ARG = "$y-$m.mbox"
=> 2004-12.mbox
Just remember, that the perl code that follows "rename:" directive
must must not contain any spaces. It all must be readable as one
string.
- regexp:REGEXP
- Get all files in ftp directory matching regexp. Directive
save: is ignored.
- regexp-no:REGEXP
- After the "regexp:" directive has matched,
exclude files that match directive regexp-no:
- Regexp:REGEXP
- This option is for interactive use. Retrieve all files from
HTTP or FTP site which match REGEXP.
- save:LOCAL-FILE-NAME
- Save file under this name to local disk.
- tagN:NAME
- Downloads can be grouped under "tagN" so that
e.g. option --Tag1 would start downloading files from that point on
until next "tag1" is found. There are currently unlimited number
of tag levels: tag1, tag2 and tag3, so that you can arrange your downlods
hierarchially in the configuration file. For example to download all Linux
files rhat you monitor, you would give option --Tag linux. To
download only the NT Emacs latest binary, you would give option --Tag
emacs-nt. Notice that you do not give the "level" in the
option, program will find it out from the configuration file after the tag
name matches.
The downloading stops at next tag of the "same level". That is,
tag2 stops only at next tag2, or when upper level tag is found (tag1) or
or until end of file.
tag1: linux # All Linux downlods under this category
tag2: sunsite tag2: another-name-for-this-spot
# List of files to download from here
tag2: ftp.funet.fi
# List of files to download from here
tag1: emacs-binary
tag2: emacs-nt
tag2: xemacs-nt
tag2: emacs
tag2: xemacs
- x:
- Extract (unpack) file after download. See also option
--unpack and --no-extract The archive file, say .tar.gz will
be extracted the file in current download location. (see directive
lcd:)
The unpack procedure checks the contents of the archive to see if the
package is correctly formed. The de facto archive format is
package-N.NN.tar.gz
In the archive, all files are supposed to be stored under the proper
subdirectory with version information:
package-N.NN/doc/README
package-N.NN/doc/INSTALL
package-N.NN/src/Makefile
package-N.NN/src/some-code.java
"IMPORTANT:" If the archive does not have a subdirectory for all
files, a subdirectory is created and all items are unpacked under it. The
defualt subdirectory name in constructed from the archive name with
currect date stamp in format:
package-YYYY.MMDD
If the archive name contains something that looks like a version number, the
created directory will be constructed from it, instead of current date.
package-1.43.tar.gz => package-1.43
- xx:
- Like directive x: but extract the archive "as
is", without checking content of the archive. If you know that it is
ok for the archive not to include any subdirectories, use this option to
suppress creation of an artificial root package-YYYY.MMDD.
- xopt:rm
- This options tells to remove any previous unpack directory.
Sometimes the files in the archive are all read-only and unpacking the
archive second time, after some period of time, would display
tar: package-3.9.5/.cvsignore: Could not create file:
Permission denied
tar: package-3.9.5/BUGS: Could not create file:
Permission denied
This is not a serious error, because the archive was already on disk and tar
did not overwrite previous files. It might be good to inform the archive
maintainer, that the files have wrong permissions. It is customary to
expect that distributed packages have writable flag set for all
files.
ERRORS¶
Here is list of possible error messages and how to deal with them. Turning on
--debug will help to understand how program has interpreted the
configuration file or command line options. Pay close attention to the
generated output, because it may reveal that a regexp for a site is too lose
or too tight.
- ERROR {URL-HERE} Bad file descriptor
- This is "file not found error". You have written
the filename incorrectly. Double check the configuration file's line.
BUGS AND LIMITATIONS¶
"Sourceforge note": To download archive files from Sourceforge
requires some trickery because of the redirections and load balancers the site
uses. The Sourceforge page have also undergone many changes during their
existence. Due to these changes there exists an ugly hack in the program to
use
wget(1) to get certain infomation from the site. This could have
been implemented in pure Perl, but as of now the developer hasn't had time to
remove the
wget(1) dependency. No doubt, this is an ironic situation to
use
wget(1). You you have Perl skills, go ahead and look at
UrlHttGet().
UrlHttGetWget() and sen patches.
The program was initially designed to read options from one line. It is
unfortunately not possible to change the program to read configuration file
directives from multiple lines, e.g. by using backslashes (\) to indicate
contuatinued line.
ENVIRONMENT¶
Variable "PWGET_CFG" can point to the root configuration file. The
configuration file is read at startup if it exists.
export PWGET_CFG=$HOME/conf/pwget.conf # /bin/hash syntax
setenv PWGET_CFG $HOME/conf/pwget.conf # /bin/csh syntax
EXIT STATUS¶
Not defined.
DEPENDENCIES¶
External utilities:
wget(1) only needed for Sourceforge.net downloads
see BUGS AND LIMITATIONS
Non-core Perl modules from CPAN:
LWP::UserAgent
Net::FTP
The following modules are loaded in run-time only if directive
cnv:text
is used. Otherwise these modules are not loaded:
HTML::Parse
HTML::TextFormat
HTML::FormatText
This module is loaded in run-time only if HTTPS scheme is used:
Crypt::SSLeay
SEE ALSO¶
lwp-download(1) lwp-mirror(1) lwp-request(1)
lwp-rget(1)
wget(1)
AUTHOR¶
Jari Aalto
LICENSE AND COPYRIGHT¶
Copyright (C) 1996-2010 Jari Aalto
This program is free software; you can redistribute and/or modify program under
the terms of GNU General Public license either version 2 of the License, or
(at your option) any later version.