table of contents
VISITORS(1) | General Commands Manual | VISITORS(1) |
NAME¶
visitors - a fast web server log analyzerSYNOPSIS¶
visitors [ options] <filename> [<filename> ...]DESCRIPTION¶
Visitors generates access statistics from specified web log files.- •
- Requested pages
- •
- Requested images
- •
- Referers by number of visits and age
- •
- Unique visitors in each day
- •
- Page views per visit
- •
- Pages accessed by the Google crawler (and the date of google's last access on every page)
- •
- Pages accessed by the AdSense crawler (and the date of adsense's last access on every page)
- •
- Percentage of visits originated from Google searches for every day
- •
- User navigation patterns (web trails)
- •
- Keyphrases used in Google searches
- •
- Human languages used in google searches
- •
- User agents
- •
- Weekdays and Hours distributions of accesses
- •
- Weekdays/Hours combined bidimensional map
- •
- Month/Day combined bidimensional map
- •
- Visual path analysis with Graphviz
- •
- Operating systems, browsers and domains popularity
- •
- Visitors screen resolution and color depth
- •
- 404 errors
Available options:¶
- -A --all
- Activate all the optional reports. This option is equivalent to -GKUWRDOB. Note that --trails is not implicitly included in this option because it also requires --prefix. See the --trails option documentation for details.
- -T --trails
- Enable the Web Trails feature. The report will show what are the more frequent moves between pages of your site. This option requires the --prefix option to work.
- -G --google
- Activate two reports about pages accessed by the Google and Adsense web crawlers. Pages are shown ordered accordingly to the last time the Google web crawler requested the page. The first page shown is the latest that was accessed.
- -K --google-keyphrases
- Activate a report that shows common search keyphrases used to found your web site from Google.
- -Z --google-keyphrases-age
- Activate a report that shows common the lastest keyphrases used to found your site from Google.
- -H --google-human-language
- Activate a report that shows common human languages used to serach from Google. This feature uses the 'hl' variable of the Google referer URL.
- -U --user-agents
- Show information about common user agents.
- -W --weekday-hour-map
- Activate the generation of a combined weekdays/hours bidimensional map that shows information about traffic in every 168 different hours of a 7 days week. Brighter colors mean higher traffic. This is ideal to figure what's the best moment on a week for a maintenance downtime, what's the target of the site, if people are accessing it from work or from home, and so on. The map is generated as pure html inside the report.
- -M --month-day-map
- Activate the generation of a combined month/day bidimensional map that shows information about traffic in every 365 different days of the year. Brighter colors mean higher traffic. This is useful in order to figure with a quick look traffic trends and days with particuarly high or low traffic. The map is generated as pure html inside the report.
- -R --referers-age
- Shows referers ordered by age. The 'age' of a referer is the date it appeared the first time. In the report, newer referers are on top. This report is useful to check for new external links.
- -D --domains
- Activate the generation of information about Top Level Domains popularity. This information may be useful to guess the amount of visits from different countries. Note that Visitors will not resolve numerical IP addresses if they are not already resolved in the log file. All the unresolved IP addresses will be shown in this report under the entry Unresolved IP.
- -O --operating-systems
- Activate the report about Operating Systems popularity, sorted by number of accesses. All the common operating systems are listed in the report, while unknown operating systems will be summed in the unknown entry.
- -B --browsers
- Activate the report about Browsers popularity, sorted by number of accesses. All the common browsers are listed in the report, while unknown browsers will be summed in the unknown entry. Browsers are listed by family (for example Internet Explorer, Opera, and so on), and not by specific version.
- -X --error404
- Activate the generation of missing documents (404 error) report. This report will show files requested, but missing, ordered by number of requests. The report is useful in order to discover if for some mistake there is some file missing in the web site, but often you will see bizarre requests performed by users or internet worms and security scans.
- -Y --pageviews
- Activate the generation of a report that shows (and approximation) of the percentage of pages viewed per unique visit. The goal of this report is to understand the usage pattern of the site and the level of interest of the visitors. For example, in a site that provides a number of pages with interesting contents, the percentage of visitors performing a single page view per visit is probably searching for something else.
- -S --robots
- Activate the generation of a report that shows user agents of clients requesting the file robots.txt, with the exception of the MSIE Crawler requests. The result is a list of web robots and spieders that accessed your web site, ordered by number of requests of robots.txt.
- --screen-info
- Activate the screen resolution and color depth reports. Note that for this report to work you have to insert on your HTML pages the javascript code you can find in the README file in the visitors tarball.
- --stream
- Enable the Stream Mode (see the STREAM MODE DETAILS section for more information). Shortly: when in stream mode Visitors will process all the log files specified (possibly none, that's valid in this mode) as usual, producing the report. Then the stream mode is entered and Visitors will start to read from standard input for a continuous stream of web logs, updating the statistics incrementally as new data is available. A new report is produced periodically if new data arrived, accordingly to the --update-every option (default is to update the statistics every ten minutes). It's possible to ask Visitors to reset the statistics after some period of time using the --reset-every option. This allows to have a snapshot of what is going on in the last five minutes, hour, day or week. Note that --stream requires --output-file because Visitors needs to overwrite the report for every update, so can't output to standard output as usually. If you plan to use the stream mode, also check the --tail option.
- --update-every seconds
- By default in Stream Mode statistics are updated every 10 minutes. This option specifies a different period in seconds.
- --reset-every seconds
- By default in Stream Mode statistics are never reset, but continuously updated incrementally. This option specifies to reset statistics after the given amount of time in seconds. This is useful to have a snapshot of the web site usage.
- -f --output-file file
- Write output to file instead of stdout.
- -m --max-lines number
- Set the max number of entries that should be shown in reports like referers, keyphrases and so on. This option sets all the reports max number of entries for all the reports at once.
- -r --max-referers number
- Set the max number of entries in the referer report.
- -p --max-pages number
- Set the max number of entries in the accessed pages report.
- -i --max-images number
- Set the max number of entries in the accessed images report.
- -x --max-error404 number
- Set the max number of entries in the missing documents report.
- -u --max-useragents number
- Set the max number of entries in the user agents report.
- -t --max-trails number
- Set the max number of entries in the web trails report.
- -g --max-googled number
- Set the max number of entries in the crawled pages report (google bot).
- --max-adsensed number
- Set the max number of entries in the crawled pages report (adsense bot).
- -k --max-google-keyphrases number
- Set the max number of entries in the Google keyphrases report.
- -a --max-referers-age number
- Set the max number of entries in the referers by date report.
- -d --max-domains number
- Set the max number of entries in the domains report.
- -P --prefix string
- Prefixes specify to visitors how a link should look like to
be classified as internal to your site. This option is required for
--trails and will also have the nice effect to avoid that internal
links are shown in the referers report. If you are analyzing statistics
for http://www.your.site.com/, just use: --prefix
http://www.your.site.com
- -o --output html|text
- Output module. You can use text or html. The default is html.
- -V --graphviz
- This option enables the Graphviz mode: Visitors will
analyze the log file and create a graph describing the access patterns of
your web site. The information used to create the graph is the same as the
web trails report (that you can enable with --trails), but as a graph it
can be more readable for non trivial sites. An example on how to use this
feature:
--graphviz > graph.dot
- -V --graphviz-ignorenode-google
- Don't put the google node on the generated graph. Only useful with --trails
- -V --graphviz-ignorenode-external
- Don't put the external referer node on the generated graph. Only useful with --trails
- -V --graphviz-ignorenode-noreferer
- Don't put the node indicating requests without referer on the generated graph. Only useful with --trails
- --tail
- When this option is specified Visitors will emulate the Unix command tail -f --max-unchanged-stats=1 -q. You can specify the log file names to monitor for changes, once new data is appended in any of the specified file, visitors will output the new data to the standard output. This option is useful conjunction to the Stream Mode (--stream). Files can be log-rotated because Visitors in Tail Mode will always try to reopen the file to check for changes.
- --time-delta delta
- If your web server is in a different timezone than most of your visitors or yourself, you will notice a shift in the reports regarding time and days of week. By default, Visitors will generate output using the host's locale. You can use the --time-delta option in order to adjust the output. Positive values will shift on the right (toward future) from the given number of hours, negative values will shift on the left (toward past). In the future this option may have support to directly specify the output timezone.
- --filter-spam
- Filter referer spam using a keyword-based filter (see blacklist.h for more information on keywords). If you don't know what referer spam is check this Wikipedia page: http://en.wikipedia.org/wiki/Referer_spam
- --ignore-404
- When this option is turned on log lines with 404 errors are just used to generate the 404 errors report and not used for other reports.
- --grep pattern
- Process only log lines matching the specified pattern. Patterns are matched using the glob-style matching (the one used by the unix shell):
- *
- Matches any sequence of characters in string, including a null string.
- ?
- Matches any single character in string.
- [chars]
- Matches any character in the set given by chars. If a sequence of the form x-y appears in chars, then any character between x and y, inclusive, will match.
- \x
- Matches the single character x. This provides a way of avoiding the special interpretation of the characters *?[]\ in pattern.
- --exclude pattern
- Works exactly like --grep, but only lines NOT matching the specified pattern are processed. Note that --grep and --exclude can be used multiple times, and are processed sequentially. For example visitors --grep firefox --exclude download will process only lines including the string firefox but not including the string download.
- --debug
- Show additional information on errors. For example invalid lines are printed on the standard error if found. Mainly useful for developers and error reporting.
- -h --help
- Show usage and copyright information.
- -v --version
- Show program version.
EXAMPLES¶
The simplest usage, to be used interactively when you have a web log to check (for example over ssh in your web server), just use:--prefix http://www.hping.org > report.html
STREAM MODE DETAILS¶
The usual way to run Visitors is to specify some option to control the report generation, and the name of log files. For example to generate a report from two Apache's access log files you can write:visitors --stream -A --update-every 60 \
--output-file /tmp/report.html
visitors --stream -A --update-every 60 \
--output-file /tmp/report.html
visitors --stream -A --update-every 30 --reset-every 3600 \
--output-file /tmp/report.html
AUTHORS¶
Visitors was written by Salvatore Sanfilippo <antirez@invece.org>.COPYING¶
Copyright (C) 2004,2005 Salvatore Sanfilippo <antirez@invece.org>. Visitors is distributed under the GNU General Public License. This manual page was written (based on the original HTML documentation) by Romain Francoise <rfrancoise@debian.org> for the Debian GNU/Linux system, but may be used by others. Salvatore Sanfilippo updated this man page starting from Visitors 0.5, this manual page is now part of the Visitors tarball.April 2005 | Visitors 0.7 |