NAME¶
lamssi_boot - overview of LAM's boot SSI modules
DESCRIPTION¶
The "kind" for boot SSI modules is "boot". Specifically, the
string "boot" (without the quotes) is the prefix that can be used as
the prefix to arguments when passing values to boot modules at run time. For
example:
- lamboot -ssi boot rsh hostfile
- Specifies to use the "rsh" boot module, and lamboot across all
the nodes listed in the file hostfile.
LAM currently has several boot modules: bproc, globus, rsh (which includes ssh),
slurm, and tm.
The LAM/MPI User's Guide contains much detail about all of the boot modules. All
users are strongly encouraged to read it. This man page is a summary of the
available information.
SELECTING A BOOT MODULE¶
Only one boot module may be selected per command execution. Hence, the selection
of which module occurs once when a given command initializes. Once the module
is chosen, it is used for the duration of the program run.
In most cases, LAM will automatically select the "best" module at
run-time. LAM will query all available modules at run time to obtain a list of
priorities. The module with the highest priority will be used. If multiple
modules return the same priority, LAM will select one at random. Priorities
are in the range of 0 to 100, with 0 being the lowest priority and 100 being
the highest. At run time, each module will examine the run-time environment
and return a priority value that is appropriate.
For example, when running a PBS job, the
tm module will return a
sufficiently high priority value such that it will be selected and the other
available modules will not.
Most modules allow run time parameters to override the priorities that they
return that allow changing the order (and therefore ultimate selection) of the
available boot modules. See below.
Alternatively, a specific module may be selected by the user by specifying a
value for the
boot parameter (either by environment variable or by the
-ssi command line parameter). In this case, no other modules will be
queried by LAM. If the named module returns a valid priority, it will be used.
For example:
- lamboot -ssi boot rsh hostfile
- Tells LAM to only query the rsh boot module and see if it is
available to run.
If the boot module that is selected is unable to run (e.g., attempting to use
the tm boot module when not running in a PBS job), an appropriate error
message will be printed and execution will abort.
AVAILABLE MODULES¶
As with all SSI modules, it is possible to pass parameters at run time. This
section discusses the built-in LAM boot modules, as well as the run-time
parameters that they accept.
In the discussion below, parameters to boot modules are discussed in terms of
name and
value. The
name and
value may be
specified as command line arguments to the
lamboot,
lamgrow,
recon, and
lamwipe commands with the
-ssi switch, or they
may be set in environment variables of the form
LAM_MPI_SSI_
name=
value. Note that using the
-ssi command
line switch will take precendence over any previously-set environment
variables.
bproc Boot Module¶
The bproc boot module uses native bproc functionality (e.g., the
bproc_execmove library call) to launch jobs on slaves nodes from the
head node. Checks are made before launching to ensure that the nodes are
available and are "owned" by the user and/or the user's group.
Appropriate error messages will be displayed if the user is unable to execute
on the target nodes.
Hostnames should be specified using bproc notation: -1 indicates the head node,
and integer numbers starting with 0 represent slave nodes. The string
"localhost" will automatically be converted to "-1".
The default behavior is to mark the bproc head node as
"non-scheduledable", meaning that the expansion of "N" and
"C" when used with
mpirun and
lamexec will exclude the
bproc head node. For example, "mpirun C my_mpi_program" will run
copies of
my_mpi_program on all lambooted slave nodes, but not the
bproc head node.
Note that the bproc boot module is
only usable from the bproc head node.
The
bproc boot module only has one tunable parameter:
- boot_bproc_priority
- Using the priority argument can override LAM's automatic run-time boot
module selection algorithms. This parameter only has effect when the
tm module is eligible to be run (i.e., when running on a bproc
cluster).
See the bproc notes in the user documentation for more details.
globus Boot Module¶
The globus boot module uses the globus-job-run command to launch executables on
remote nodes. It is currently limited to only allowing jobs that can use the
fork job manager on the Globus gatekeeper. Other job managers are not yet
supported.
LAM will effectively never select the
globus boot module by default
because it has an extremely low default priority; it must be manually selected
with the boot SSI parameter or have its priority raised. Additionally, LAM
must be able to find the globus-job-run command in your
PATH.
The boot schema requires hosts to be listed as the Globus contact string. For
example:
"host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc"
Note the use of quotes because the CN includes spaces -- the entire contact name
must be enclosed in quotes. Additionally, since globus-job-run does not invoke
the user's "dot" files on the remote nodes, no PATH or environment
is setup. Hence, the attribute
lam_install_path must be specified for
each contact string in the hostfile so that LAM knows where to find its
executables on the remote nodes. For example:
"host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc" lam_install_path=/home/lam
The
globus boot module only has one tunable parameter:
- boot_globus_priority
- Using the priority argument can override LAM's automatic run-time boot
module selection algorithms.
rsh Boot Module¶
The rsh boot module uses rsh or ssh (or any other command line agent that acts
like rsh/ssh) to launch executables on remote nodes. It requires that
executables can be started on remote nodes without being prompted for a
password, and without outputting anything to stderr.
The
rsh boot module is always available, and unless overridden, always
assigns itself a priority of 0.
The
rsh module accepts a few run-time parameters:
- boot_rsh_agent
- Used to override the compiled-in default remote agent program that was
selected when LAM is compiled. For example, this parameter can be set to
use "ssh" if LAM was compiled to use "rsh" by default.
Previous versions of LAM/MPI used the LAMRSH environment variable for this
purpose. While the LAMRSH environment variable still works, its use is
deprecated in favor of the boot_rsh_agent SSI module argument.
- boot_rsh_priority
- Using the priority argument can override LAM's automatic run-time boot
module selection algorithms.
- boot_rsh_username
- If the user has a different username on the remote machine, this parameter
can be used to pass the -l argument to the underlying remote agent.
Note that this is a coarse-grained control -- this one username will be
used for all remote nodes. If more fine-grained control is required, the
username should be specified in the boot schema file on a per-host
basis.
slurm Boot Module¶
The
slurm boot module uses the
srun command to launch the LAM
daemons in a SLURM execution environment (i.e., it detects that it is running
under SLURM and automatically sets its priority to 50). It can be used in two
different modes: batch (where a script is submitted to SLURM and it is run on
the first node in the node allocation) and allocate (where the
-A
option is used to srun to obtain an interactive allocation). The
slurm
boot module does
not support running in a script that is launched by
SLURM on all nodes in an allocation.
No boot schema file is required when using the
slurm boot module; LAM
will automatically determine the host and CPU count from SLURM itself.
The
slurm boot module only has one tunable parameter:
- boot_slurm_priority
- Using the priority argument can override LAM's automatic run-time boot
module selection algorithms. This parameter only has effect when the
slurm module is eligible to be run (i.e., when running in a SLURM
allocation).
tm Boot Module¶
The
tm boot module uses the Task Management (TM) interface to launch
executables on remote nodes. Currently, only OpenPBS and PBSPro are the only
two systems that implement the TM interface. Hence, when LAM detects that it
is running in a PBS job, it will automatically set the
tm priority to
50. When not running in a PBS job, the
tm module will not be available.
The
tm boot module only has one tunable parameter:
- boot_tm_priority
- Using the priority argument can override LAM's automatic run-time boot
module selection algorithms. This parameter only has effect when the
tm module is eligible to be run (i.e., when running in a PBS
job).
SEE ALSO¶
lamssi(7), mpirun(1), LAM User's Guide