oarnodecheckrun(8) | OAR commands | oarnodecheckrun(8) |
NAME¶
oarnodecheck - OAR node health check mechanism
SYNOPSIS¶
oarnodecheckrun
oarnodecheckquery
oarnodechecklist
DESCRIPTION¶
oarnodecheck is composed of 3 commands:
- oarnodecheckrun
- oarnodecheckrun must be run as root by cron or a systemd timer (on
an hourly basis for instance) to execute all check scripts in the
//etc/oar/check.d/ directory.
The %%OARCONFDIR/check.d/ directory contains admin defined scripts, which perform checks with regard to possible node health problems.
If and only if a problem is detected, a check-log file is to be created by the check script in the check-log directory. The check script must use the CHECKLOGFILE environment variable, which provide the pathname to the check-log to eventually create.
If the OAR cpuset mechanism is enabled, oarnodecheckrun does not launch checks when jobs are running on the node. A stamp file is created or updated when the scripts are actually run.
- oarnodecheckquery
- oarnodecheckquery is meant to be called by the OAR ping checker, to
report the node health status.
It can be configured so in the /etc/oar/oar.conf file of the OAR server:
PINGCHECKER_TAKTUK_ARG_COMMAND="-t 3 broadcast exec [ /usr/bin/oarnodecheckquery ]
The OAR node health status is reported bad as soon as a check-log file exists in the check-log directory: /var/lib/oar/checklogs/.
oarnodecheckquery checks for the existence and modification date of the oarnodecheckrun stamp file. If non-existent or older than one hour, oarnodecheckrun is run. Then, finally, oarnodecheckquery reports an error if any check-log exists in the check-log directory.
Since oarnodecheckquery may run the check scripts, the OAR ping checker timeout must be tuned accordingly in the OAR server configuration.
- oarnodechecklist
- oarnodechecklist lists the current recorded check-logs.
EXAMPLE OF CHECK SCRIPT¶
The following is an example of check script to place in the check scripts directory: /etc/oar/check.d
#!/bin/bash ############################################################################### # Perform a check and report to CHECKLOGFILE # WARNING: # The CHECKLOGFILE file must not be created unless the check really unveiled # a problem. # Print to stderr if CHECKLOGFILE is not defined yet (e.g. as the script is # not called from oarnodecheckcron the CHECKLOGFILE environment variable is # not defined) [ -n "$CHECKLOGFILE" ] || CHECKLOGFILE=/dev/stderr ############################################################################### # YOUR CHECK SCRIPT GOES BELOW # Example of check [ -d /var/lib/oar ] || echo "OAR runtime directory (/var/lib/oar) does not exist)" > $CHECKLOGFILE
COPYRIGHTS¶
Copyright 2003-2025 Laboratoire d'Informatique de Grenoble (http://www.liglab.fr). This software is licensed under the GNU General Public License Version 2 or above. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2025-02-27 | oarnodecheckrun |