HB_REPORT(8) | Pacemaker documentation | HB_REPORT(8) |
NAME¶
hb_report - create report for CRM based clusters (Pacemaker)
SYNOPSIS¶
hb_report -f {time|"cts:"testnum} [-t time] [-u user] [-l file] [-n nodes] [-E files] [-p patt] [-L patt] [-e prog] [-MSDCZAQVsvhd] [dest]
DESCRIPTION¶
The hb_report(1) is a utility to collect all information (logs, configuration files, system information, etc) relevant to Pacemaker (CRM) over the given period of time.
OPTIONS¶
dest
-d
-f { time | "cts:"testnum }
-t time
-n nodes
-l file
-E files
-M
-L patt
-p patt
-Q
-A
-u user
-X ssh-options
-S
-Z
-V
-v
-h
-D (obsolete)
-e prog (obsolete)
-C (obsolete)
EXAMPLES¶
Last night during the backup there were several warnings encountered (logserver is the log host):
logserver# hb_report -f 3:00 -t 4:00 -n "node1 node2" report
collects everything from all nodes from 3am to 4am last night. The files are compressed to a tarball report.tar.bz2.
Just found a problem during testing:
# note the current time node1# date Fri Sep 11 18:51:40 CEST 2009 node1# /etc/init.d/heartbeat start node1# nasty-command-that-breaks-things node1# sleep 120 #wait for the cluster to settle node1# hb_report -f 18:51 hb1
# if hb_report can't figure out that this is corosync node1# hb_report -f 18:51 -A hb1
# if hb_report can't figure out the cluster members node1# hb_report -f 18:51 -n "node1 node2" hb1
The files are compressed to a tarball hb1.tar.bz2.
INTERPRETING RESULTS¶
The compressed tar archive is the final product of hb_report. This is one example of its content, for a CTS test case on a three node OpenAIS cluster:
$ ls -RF 001-Restart
001-Restart: analysis.txt events.txt logd.cf s390vm13/ s390vm16/ description.txt ha-log.txt openais.conf s390vm14/
001-Restart/s390vm13: STOPPED crm_verify.txt hb_uuid.txt openais.conf@ sysinfo.txt cib.txt dlm_dump.txt logd.cf@ pengine/ sysstats.txt cib.xml events.txt messages permissions.txt
001-Restart/s390vm13/pengine: pe-input-738.bz2 pe-input-740.bz2 pe-warn-450.bz2 pe-input-739.bz2 pe-warn-449.bz2 pe-warn-451.bz2
001-Restart/s390vm14: STOPPED crm_verify.txt hb_uuid.txt openais.conf@ sysstats.txt cib.txt dlm_dump.txt logd.cf@ permissions.txt cib.xml events.txt messages sysinfo.txt
001-Restart/s390vm16: STOPPED crm_verify.txt hb_uuid.txt messages sysinfo.txt cib.txt dlm_dump.txt hostcache openais.conf@ sysstats.txt cib.xml events.txt logd.cf@ permissions.txt
The top directory contains information which pertains to the cluster or event as a whole. Files with exactly the same content on all nodes will also be at the top, with per-node links created (as it is in this example the case with openais.conf and logd.cf).
The cluster log files are named ha-log.txt regardless of the actual log file name on the system. If it is found on the loghost, then it is placed in the top directory. If not, the top directory ha-log.txt contains all nodes logs merged and sorted by time. Files named messages are excerpts of /var/log/messages from nodes.
Most files are copied verbatim or they contain output of a command. For instance, cib.xml is a copy of the CIB found in /var/lib/heartbeat/crm/cib.xml. crm_verify.txt is output of the crm_verify(8) program.
Some files are result of a more involved processing:
analysis.txt
events.txt
permissions.txt
backtraces.txt
sysinfo.txt
sysstats.txt
description.txt should contain a user supplied description of the problem, but since it is very seldom used, it will be dropped from the future releases.
PREREQUISITES¶
ssh
sudo
Times
Core dumps
TIMES¶
Specifying times can at times be a nuisance. That is why we have chosen to use one of the perl modules—they do allow certain freedom when talking dates. You can either read the instructions at the Date::Parse <http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES> examples page" . or just rely on common sense and try stuff like:
3:00 (today at 3am) 15:00 (today at 3pm) 2007/9/1 2pm (September 1st at 2pm) Tue Sep 15 20:46:27 CEST 2009 (September 15th etc)
hb_report will (probably) complain if it can’t figure out what do you mean.
Try to delimit the event as close as possible in order to reduce the size of the report, but still leaving a minute or two around for good measure.
-f is not optional. And don’t forget to quote dates when they contain spaces.
SHOULD I SEND ALL THIS TO THE REST OF INTERNET?¶
By default, the sensitive data in CIB and PE files is not mangled by hb_report because that makes PE input files mostly useless. If you still have no other option but to send the report to a public mailing list and do not want the sensitive data to be included, use the -s option. Without this option, hb_report will issue a warning if it finds information which should not be exposed. By default, parameters matching passw.* are considered sensitive. Use the -p option to specify additional regular expressions to match variable names which may contain information you don’t want to leak. For example:
# hb_report -f 18:00 -p "user.*" -p "secret.*" /var/tmp/report
Heartbeat’s ha.cf is always sanitized. Logs and other files are not filtered.
LOGS¶
It may be tricky to find syslog logs. The scheme used is to log a unique message on all nodes and then look it up in the usual syslog locations. This procedure is not foolproof, in particular if the syslog files are in a non-standard directory. We look in /var/log /var/logs /var/syslog /var/adm /var/log/ha /var/log/cluster. In case we can’t find the logs, please supply their location:
# hb_report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1
If you have different log locations on different nodes, well, perhaps you’d like to make them the same and make life easier for everybody.
Files starting with "ha-" are preferred. In case syslog sends messages to more than one file, if one of them is named ha-log or ha-debug those will be favoured over syslog or messages.
hb_report supports also archived logs in case the period specified extends that far in the past. The archives must reside in the same directory as the current log and their names must be prefixed with the name of the current log (syslog-1.gz or messages-20090105.bz2).
If there is no separate log for the cluster, possibly unrelated messages from other programs are included. We don’t filter logs, but just pick a segment for the period you specified.
MANUAL REPORT COLLECTION¶
So, your ssh doesn’t work. In that case, you will have to run this procedure on all nodes. Use -S so that hb_report doesn’t bother with ssh:
# hb_report -f 5:20pm -t 5:30pm -S /tmp/report_node1
If you also have a log host which is not in the cluster, then you’ll have to copy the log to one of the nodes and tell us where it is:
# hb_report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1
OPERATION¶
hb_report collects files and other information in a fairly straightforward way. The most complex tasks are discovering the log file locations (if syslog is used which is the most common case) and coordinating the operation on multiple nodes.
The instance of hb_report running on the host where it was invoked is the master instance. Instances running on other nodes are slave instances. The master instance communicates with slave instances by ssh. There are multiple ssh invocations per run, so it is essential that the ssh works without password, i.e. with the public key authentication and authorized_keys.
The operation consists of three phases. Each phase must finish on all nodes before the next one can commence. The first phase consists of logging unique messages through syslog on all nodes. This is the shortest of all phases.
The second phase is the most involved. During this phase all local information is collected, which includes:
The third phase is collecting information from all nodes and analyzing it. The analyzis consists of the following tasks:
BUGS¶
Finding logs may at times be extremely difficult, depending on how weird the syslog configuration. It would be nice to ask syslog-ng developers to provide a way to find out the log destination based on facility and priority.
If you think you found a bug, please rerun with the -v option and attach the output to bugzilla.
hb_report can function in a satisfactory way only if ssh works to all nodes using authorized_keys (without password).
There are way too many options.
AUTHOR¶
Written by Dejan Muhamedagic, <dejan@suse.de>
RESOURCES¶
Pacemaker: <http://clusterlabs.org/>
Heartbeat and other Linux HA resources: <http://linux-ha.org/wiki>
OpenAIS: <http://www.openais.org/>
Corosync: <http://www.corosync.org/>
SEE ALSO¶
Date::Parse(3)
COPYING¶
Copyright (C) 2007-2009 Dejan Muhamedagic. Free use of this software is granted under the terms of the GNU General Public License (GPL).
2024-08-31 | hb_report |