NAME¶
salinfo_decode - decode Itanium SAL error records
SYNOPSIS¶
salinfo_decode [-d] [-i pct] [-s pct] [-l limit]
[-T filename] -t type -D directory
salinfo_decode [-d] filename
DESCRIPTION¶
salinfo_decode extracts and decodes CMC/CPE/MCA/INIT records from SAL. It
can decode a saved record from a file, or it can request records from the
kernel, decode them, save the raw and decoded records, and clear them from
SAL.
OPTIONS¶
- -d
- Each -d increments the debug level.
- -i pct
- A persistent error such as bad memory can generate a lot of
records. To prevent a persistent error from using all the inodes on the
filesystem containing the SAL logs, specify -i pct. If
the percentage of inodes used in the filesystem containing the SAL logs is
above this percentage then salinfo_decode will stop writing records. The
records are still cleared from SAL, so they are lost forever. A count of
the number of lost records is kept and written to syslog occasionally.
This option can only be used with -D. Note: Not all
filesystems have a fixed number of inodes, some will dynamically add new
inodes as required. Using -i pct for such filesystems
makes little sense.
- -s pct
- Like -i pct, -s pct will stop
writing records if the percentage of space used on the SAL log filesystem
is above pct.
- -l limit
- If more than limit records of this type occur within
a minute then drop the additional records. A count of the number of lost
records is kept and written to syslog occasionally. This option can only
be used with -D. Note: The limit should be larger than
the number of cpus in your system to cope with reading the saved SAL
records at boot, they are all processed at approximately the same
time.
- -T filename
- For each record that is written to -D, write a
trigger line to filename. The trigger line contains the base
filename in the first field, followed by the options that were passed to
salinfo_decode, with -t type and
-D directory as the first two options. A post
processing program can monitor the trigger file and perform any operation
on the raw or decoded records, including erasing them. If the post
processing program erases the records then it should erase the decoded
record before the raw record, to avoid conflicts with the calculation of
the filename suffix. If writing to filename would block then
salinfo_decode discards the trigger line. A count of the number of lost
triggers is kept and written to syslog occasionally. This option can only
be used with -D.
- -t type
- Specifies the type of record to monitor. Must be one of
"cmc", "cpe", "mca", or "init" in
lower case. The type is used as the third qualifier to access /proc/sal/
type/{event,data}.
- -D directory
- Specifies the directory where the raw records and the
decoded text will be written. The raw record is written to
directory/raw, the decoded text is written to
directory/decoded. The filenames are constructed from the record
timestamp (year, month, day, hours, minutes, seconds), the record
type, the cpu number and a suffix starting at '.0' to separate
multiple events with the same timestamp.
If either
type or
directory is specified, then both are required.
If neither is supplied, then a filename must be supplied.
OPERATION¶
When
type and
directory are supplied,
salinfo_decode will
open /proc/sal/
type/event and wait until the kernel supplies the
number of a cpu that has a record of this type.
salinfo_decode
then :-
- *
- Reads the record from the kernel.
- *
- Extracts the timestamp.
- *
- Generates a unique filename from the timestamp,
type, and cpu number.
- *
- If the raw record matches an entry in directory/raw
then the new record is discarded, with a syslog entry listing the
duplicate name, otherwise ...
- *
- Writes the raw record to directory/raw.
- *
- Decodes the raw record into directory/decoded,
calling salinfo_decode_oem to decode any OEM data as required (only
if salinfo_decode_oem exists).
- *
- Clears the record from SAL.
- *
- Waits for another record of this type.
When only a filename is specified,
salinfo_decode assumes it is a raw
record, reads it, and decodes it without invoking SAL.
The trigger filename is provided to make any post processing more efficient, by
avoiding frequent polling in the post processing program. However the post
processing program should not assume that it receives a trigger line for every
SAL record, there are many cases where the trigger may be lost. This includes
any time that salinfo_decode is running but the post processing program is
not, especially at boot. It also includes when the post processing is slower
than the rate at which SAL records are being generated. The post processor
should periodically scan the SAL log directories for any records that have not
been processed yet, however this can be done every few hours, instead of every
few seconds.
SYSLOG MESSAGES¶
If
salinfo_decode has to drop records for any reason, it records the
number of dropped records and the reason that they were dropped. Every 30
minutes, or when an ALRM or HUP signal is received,
salinfo_decode logs
the number and reason of dropped records to syslog, but only if there have
been any dropped records since the last time it checked. The syslog messages
are of the form
salinfo_decode[<pid>]: <n> <type> records dropped since
<date>,
followed by the number of records that were dropped due to restrictions set by
-i pct, -s pct and -l limit. If -T was specified and any
trigger records have been dropped (but the original record was processed) then
the log entry reads
salinfo_decode[<pid>]: <n> <type> trigger records dropped
since <date>
OEM DATA¶
The Itanium SAL specification defines the overall structure of SAL error
records, but the records may contain platform-specific information. To decode
platform-specific OEM data,
salinfo_decode attempts to invoke a program
called
salinfo_decode_oem. If that program does not exist, or it exists
but does not decode the OEM data, the OEM data is printed in hex.
salinfo_decode_oem is invoked with the same file descriptors as
salinfo_decode. In particular, both programs can access the file
descriptor used to read the raw record.
Communication between
salinfo_decode and
salinfo_decode_oem is via
a pair of pipes, which
salinfo_decode_oem sees as file descriptors 0
and 1. To decode OEM data,
salinfo_decode writes this data to
salinfo_decode_oem via a pipe :-
- *
- A line of
"==== salinfo_decode_oem start ====".
- *
- A variable number of lines of the form
"key=value". salinfo_decode_oem must ignore keys that it
does not recognise. At a minimum, values must be supplied for these
keys :-
- -
- fd_data - file descriptor for accessing the raw
record.
- -
- use_sal - 1 if the raw record is being accessed via
SAL.
- -
- cpu - the cpu that the record belongs to, -1 for records
that are not being read directly from SAL.
- -
- raw_length - the length of the raw record.
- -
- oem_section_offset - the offset of the section containing
the OEM data to decode.
- *
- A line of
"==== salinfo_decode_oem record ====".
- *
- The raw record, exactly raw_length bytes, followed by a
newline.
- *
- A line of
"==== salinfo_decode_oem end ====".
salinfo_decode_oem reads the above data on fd 0, decodes the OEM data if
possible and writes the decoded output on its fd 1. Output from
salinfo_decode_oem consists of :-
- *
- A line of
"==== salinfo_decode_oem start ====".
- *
- The decoded OEM data. If salinfo_decode_oem cannot
decode this OEM data, it returns no data here, and salinfo_decode
will print the OEM data in hex.
- *
- A line of
"==== salinfo_decode_oem end ====".