table of contents
SGE_SHADOWD(8) | Grid Engine Administrative Commands | SGE_SHADOWD(8) |
NAME¶
sge_shadowd - Grid Engine shadow master daemon
SYNOPSIS¶
sge_shadowd
DESCRIPTION¶
sge_shadowd is a "light weight" process which can be run on so-called shadow master hosts in a Grid Engine cluster to detect failure of the current Grid Engine master daemon, sge_qmaster(8), and to start-up a new sge_qmaster(8) on the host on which the sge_shadowd runs. If multiple shadow daemons are active in a cluster, they run a protocol which ensures that only one of them will start-up a new master daemon.
The hosts suitable as shadow master hosts must have shared root read/write access to the directory $SGE_ROOT/$SGE_CELL/common, as well as to the master daemon spool directory (by default $SGE_ROOT/$SGE_CELL/spool/qmaster). The names of the shadow master hosts need to be contained in the file $SGE_ROOT/$xQS_NAME_Sxx_CELL/common/shadow_masters.
RESTRICTIONS¶
sge_shadowd may only be started by root.
ENVIRONMENT VARIABLES¶
- SGE_ROOT
- Specifies the location of the Grid Engine standard configuration files.
- SGE_CELL
- If set, specifies the default Grid Engine cell. To address a Grid Engine cell sge_shadowd uses (in order of precedence):
The name of the default cell, i.e. default.
- SGE_DEBUG_LEVEL
- If set, specifies that debug information should be written to stderr. In addition the level of detail in which debug information is generated is defined.
- SGE_QMASTER_PORT
- If set, specifies the TCP port on which sge_qmaster(8) is expected to listen for communication requests. Most installations will use a services map entry for the service "sge_qmaster" instead to define that port.
- SGE_DELAY_TIME
- This variable controls the time for which sge_shadowd pauses if a takeover bid fails. This value is used only when there are multiple sge_shadowd instances and they are contending to be the master. The default is 600 seconds.
- SGE_CHECK_INTERVAL
- This variable controls the interval between sge_shadowd checks of the heartbeat file (60 seconds by default).
- SGE_GET_ACTIVE_INTERVAL
- This variable controls the interval between attempts by a sge_shadowd instance to take over when the heartbeat file has not changed. The default is 240 seconds.
FILES¶
<sge_root>/<cell>/common Default configuration directory <sge_root>/<cell>/common/shadow_masters Shadow master hostname file. <sge_root>/<cell>/spool/qmaster Default master daemon spool directory <sge_root>/<cell>/spool/qmaster/heartbeat The heartbeat file.
SEE ALSO¶
COPYRIGHT¶
See sge_intro(1) for a full statement of rights and permissions.
2007-11-08 | SGE 8.1.3pre |