NAME¶
ompi-restart, orte-restart - Restart a previously checkpointed parallel job
using the Open PAL Checkpoint/Restart Service (CRS)
NOTE: ompi-restart, and
orte-restart are all exact synonyms
for each other. Using any of the names will result in exactly identical
behavior.
SYNOPSIS¶
ompi-restart [ options ] <GLOBAL SNAPSHOT HANDLE>
Options¶
ompi-restart will attempt to restart a previously checkpointed parallel
job from the global snapshot handle reference returned by
ompi_checkpoint.
- <GLOBAL SNAPSHOT HANDLE>
- The global snapshot handle reference returned by
ompi_checkpoint, used to restart the job. This is required to be
the last argument to this command.
- -h | --help
- Display help for this command
- -p | --preload
- Preload the checkpoint files on the remote systems before
restarting the application. Disabled by default.
- --fork
- Fork off a new process, which is the restarted process. By
default, the restarted process will replace ompi-restart.
- -s | --seq
- The sequence number of the checkpoint to restart from. By
default, the most recent sequence number is used (specified by -1).
- -hostfile | --hostfile
- The hostfile from which to restart the application. Useful
in unscheduled environments. (Same behavior as --machinefile option)
- -machinefile | --machinefile
- The machinefile from which to restart the application.
Useful in unscheduled environments. (Same behavior as --hostfile
option)
- -v | --verbose
- Enable verbose output for debugging.
- -gmca | --gmca <key> <value>
- Pass global MCA parameters that are applicable to all
contexts. <key> is the parameter name; <value>
is the parameter value.
- -mca | --mca <key> <value>
- Send arguments to various MCA modules.
DESCRIPTION¶
ompi-restart can be invoked multiple, non-overlapping times. This allows
the user to restart a previously running parallel job.
SEE ALSO¶
orte-ps(1),
orte-clean(1),
ompi-checkpoint(1),
opal-checkpoint(1),
opal-restart(1),
opal_crs(7)