NAME¶
checklink - check the validity of links in an HTML or XHTML document
SYNOPSIS¶
checklink [ 
options ] 
uri ...
DESCRIPTION¶
This manual page documents briefly the 
checklink command, a.k.a. the W3CX
  Link Checker.
checklink is a program that reads an HTML or XHTML document, extracts a
  list of anchors and links and checks that no anchor is defined twice and that
  all the links are dereferenceable, including the fragments. It warns about
  HTTP redirects, including directory redirects, and can check recursively a
  part of a web site.
The program can be used either as a command line tool or as a CGI script.
OPTIONS¶
This program follow the usual GNU command line syntax, with long options
  starting with two dashes (`-'). A summary of options is included below.
  - -?, -h, --help
 
  - Show summary of options.
 
  - -V, --version
 
  - Output version information.
 
  - -s, --summary
 
  - Show result summary only.
 
  - -b, --broken
 
  - Show only the broken links, not the redirects.
 
  - -e, --directory
 
  - Hide directory redirects - e.g. <http://www.w3.org/TR> ->
      <http://www.w3.org/TR/>.
 
  - -r, --recursive
 
  - Check the documents linked from the first one.
 
  - -D, --depth n
 
  - Check the documents linked from the first one to depth n (implies
      --recursive).
 
  - -l, --location uri
 
  - Scope of the documents checked (implies --recursive). Can be
      specified multiple times in order to specify multiple recursion bases. If
      the URI of a candidate document is downwards relative to any of the bases,
      it is considered to be within the scope. If not specified, the default is
      the base URI of the initial document, for example for
      <http://www.w3.org/TR/html4/Overview.html> it would be
      <http://www.w3.org/TR/html4/>.
 
  - -X, --exclude regexp
 
  - Do not check links whose full, canonical URIs match regexp. Note
      that this option limits recursion the same way as --exclude-docs
      with the same regular expression would.
 
  - --exclude-docs regexp
 
  - In recursive mode, do not check links in documents whose full, canonical
      URIs match regexp. This option may be specified multiple
    times.
 
  - --suppress-redirect URI->URI
 
  - Do not report a redirect from the first to the second URI. The
      "->" is literal text. This option may be specified multiple
      times. Whitespace may be used instead of "->" to separate the
      URIs.
 
  - --suppress-redirect-prefix URI->URI
 
  - Do not report a redirect from a child of the first URI to the same child
      of the second URI. The \"->\" is literal text. This option
      may be specified multiple times. Whitespace may be used instead of
      "->" to separate the URIs.
 
  - --suppress-temp-redirects
 
  - Do not report warnings about temporary redirects.
 
  - --suppress-broken CODE:URI
 
  - Do not report a broken link with the given CODE. CODE is the HTTP
      response, or -1 for robots exclusion. The ":" is literal text.
      This option may be specified multiple times. Whitespace may be used
      instead of ":" to separate the CODE and the URI.
 
  - --suppress-fragment URI
 
  - Do not report the given broken fragment URI. A fragment URI contains
      "#". This option may be specified multiple times.
 
  - -L, --languages accept-language
 
  - The "Accept-Language" HTTP header to send. In command line mode,
      this header is not sent by default. The special value "auto"
      causes a value to be detected from the "LANG" environment
      variable, and sent if found. In CGI mode, the default is to send the value
      received from the client as is.
 
  - -c, --cookies cookie-file
 
  - Use cookies, load/save them in cookie-file. The special value
      "tmp" causes non-persistent use of cookies, i.e. they are used
      but only stored in memory for the duration of this link checker run.
 
  - -R, --no-referer
 
  - Do not send the "Referer" HTTP header.
 
  - -q, --quiet
 
  - No output if no errors are found. Implies --summary.
 
  - -v, --verbose
 
  - Verbose mode.
 
  - -i, --indicator
 
  - Show progress while parsing as percentage of lines processed. No indicator
      is shown for documents containing no linefeeds.
 
  - -u, --user username
 
  - Specify a username for authentication.
 
  - -p, --password password
 
  - Specify a password for authentication.
 
  - --hide-same-realm
 
  - Hide 401's that are in the same realm as the document checked.
 
  - -S, --sleep secs
 
  - Sleep the specified number of seconds between requests to each server.
      Defaults to 1 second, which is also the minimum allowed.
 
  - -t, --timeout secs
 
  - Timeout for requests, in seconds. The default is 30.
 
  - -C, --connection-cache number
 
  - Maximum number of cached connections. Using this option overrides the
      "Connection_Cache_Size" configuration file parameter, see its
      documentation below for the default value and more information.
 
  - -d, --domain domain
 
  - Perl regular expression describing the domain to which the authentication
      information (if present) will be sent. The default value can be specified
      in the configuration file. See the "Trusted" entry in the
      configuration file description below for more information.
 
  - --masquerade "real-prefix surrogate-prefix"
 
  - Perform a simple string substitution: URIs which begin with the string
      "real-prefix" are rewritten using the
      "surrogate-prefix" before being dereferenced. Useful for making
      a local directory masquerade as a remote one. For example:
    
    
  --masquerade "http://example.com/x/y/z/ file:///my/local/dir/"
    
    
    If the document being checked contains a link to
      http://example.com/x/y/z/foo.html, then the local file system will be
      checked for file:///my/local/dir/foo.html.
    
     --masquerade takes a single argument consisting of two URIs,
      separated by whitespace. The quote marks are not part of the argument, but
      one usual way of providing a value with embedded whitespace is to enclose
      it in quotes. 
  - -H, --html
 
  - HTML output.
 
FILES¶
  - /etc/w3c/checklink.conf
 
  - The main configuration file. You can use the W3C_CHECKLINK_CFG environment
      variable to override the default location.
    
    "Trusted" specifies a regular expression for matching trusted
      domains (ie. domains where HTTP basic authentication, if any, will be
      sent). The regular expression will be matched case insensitively against
      host names. The default behavior (when unset, that is) is to send the
      authentication information only to the host which requests it; usually you
      don't want to change this. For example, the following configures
      only the w3.org domain as trusted:
    
    
    Trusted = \.w3\.org$
    
    
    "Allow_Private_IPs" is a boolean flag indicating whether checking
      links on non-public IP addresses is allowed. The default is true in
      command line mode and false when run as a CGI script. For example, to
      disallow checking non-public IP addresses, regardless of the mode, use:
    
       Allow_Private_IPs = 0
    
    
    "Forbidden_Protocols" is a comma separated list of additional
      protocols/URI schemes that the link checker is not allowed to use. The
      "javascript" and "mailto" schemes are always
      forbidden, and so is the "file" scheme when running as a CGI
      script.
    
       Forbidden_Protocols = javascript,mailto
    
    
    "Markup_Validator_URI" and "CSS_Validator_URI" are
      formatted URIs to the respective validators. The %s in these will be
      replaced with the full "URI encoded" URI to the document being
      checked, and shown in the link checker results view in the online/CGI
      version. The defaults are:
    
       Markup_Validator_URI =
     http://validator.w3.org/check?uri=%s
   CSS_Validator_URI =
     http://jigsaw.w3.org/css-validator/validator?uri=%s
    
    
    "Doc_URI" is a URI used for linking to the documentation, and CSS
      and JavaScript files in the dynamically generated content of the link
      checker. The default is:
    
       Doc_URI = http://validator.w3.org/docs/checklink.html
    
    
    "Connection_Cache_Size" is an integer denoting the maximum number
      of connections the link checker will keep open at any given time. The
      default is:
    
       Connection_Cache_Size = 2
    
   
ENVIRONMENT¶
checklink uses the libwww-perl library which has a number of environment
  variables affecting its behaviour. See "SEE ALSO" for some pointers.
  - W3C_CHECKLINK_CFG
 
  - If set, overrides the path to the configuration file.
 
SEE ALSO¶
The documentation for this program is available on the web at
  <
http://validator.w3.org/docs/checklink.html>.
LWP, Net::FTP, Net::NNTP, Net::IP, perlre.
AUTHOR¶
This program was originally written by Hugo Haas <hugo@w3.org>, based on
  Renaud Bruyeron's 
checklink.pl. It has been enhanced by Ville Skyttae
  and many other volunteers since. Use the <www-validator@w3.org> mailing
  list for feedback, and see
  <
http://validator.w3.org/docs/checklink.html#csb> for more information.
This manual page was originally written by Frederic Schuetz
  <schutz@mathgen.ch> for the Debian GNU/Linux system (but may be used by
  others).
COPYRIGHT¶
This program is licensed under the W3CX Software License,
  <
http://www.w3.org/Consortium/Legal/copyright-software>.