Scroll to navigation

oarnodecheckrun(8) OAR commands oarnodecheckrun(8)

NAME

oarnodecheck - OAR node health check mechanism

SYNOPSIS

oarnodecheckrun

oarnodecheckquery

oarnodechecklist

DESCRIPTION

oarnodecheck is composed of 3 commands:

oarnodecheckrun must be run as root by cron or a systemd timer (on an hourly basis for instance) to execute all check scripts in the //etc/oar/check.d/ directory.

The %%OARCONFDIR/check.d/ directory contains admin defined scripts, which perform checks with regard to possible node health problems.

If and only if a problem is detected, a check-log file is to be created by the check script in the check-log directory. The check script must use the CHECKLOGFILE environment variable, which provide the pathname to the check-log to eventually create.

If the OAR cpuset mechanism is enabled, oarnodecheckrun does not launch checks when jobs are running on the node. A stamp file is created or updated when the scripts are actually run.

oarnodecheckquery is meant to be called by the OAR ping checker, to report the node health status.

It can be configured so in the /etc/oar/oar.conf file of the OAR server:

    PINGCHECKER_TAKTUK_ARG_COMMAND="-t 3 broadcast exec [ /usr/bin/oarnodecheckquery ]
    

The OAR node health status is reported bad as soon as a check-log file exists in the check-log directory: /var/lib/oar/checklogs/.

oarnodecheckquery checks for the existence and modification date of the oarnodecheckrun stamp file. If non-existent or older than one hour, oarnodecheckrun is run. Then, finally, oarnodecheckquery reports an error if any check-log exists in the check-log directory.

Since oarnodecheckquery may run the check scripts, the OAR ping checker timeout must be tuned accordingly in the OAR server configuration.

oarnodechecklist lists the current recorded check-logs.

EXAMPLE OF CHECK SCRIPT

The following is an example of check script to place in the check scripts directory: /etc/oar/check.d

    #!/bin/bash
    ###############################################################################
    # Perform a check and report to CHECKLOGFILE
    # WARNING:
    # The CHECKLOGFILE file must not be created unless the check really unveiled
    # a problem.
    
    # Print to stderr if CHECKLOGFILE is not defined yet (e.g. as the script is
    # not called from oarnodecheckcron the CHECKLOGFILE environment variable is
    # not defined)
    [ -n "$CHECKLOGFILE" ] || CHECKLOGFILE=/dev/stderr
    
    ###############################################################################
    # YOUR CHECK SCRIPT GOES BELOW
    
    # Example of check
    [ -d /var/lib/oar ] || echo "OAR runtime directory (/var/lib/oar) does not exist)" > $CHECKLOGFILE

COPYRIGHTS

 Copyright 2003-2025 Laboratoire d'Informatique de Grenoble (http://www.liglab.fr). This software is licensed under the GNU General Public License Version 2 or above. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2025-02-27 oarnodecheckrun