NAME¶
ompi-checkpoint, orte-checkpoint - Checkpoint a running parallel process using
the Open MPI Checkpoint/Restart Service (CRS)
NOTE: ompi-checkpoint, and
orte-checkpoint are all exact
synonyms for each other. Using any of the names will result in exactly
identical behavior.
SYNOPSIS¶
ompi-checkpoint [ options ] <PID_OF_MPIRUN>
Options¶
orte-checkpoint will attempt to notify a running parallel job (identified
by
mpirun) that it has been requested that the job checkpoint itself. A
global snapshot handle reference is presented to the user, which is used in
ompi_restart to restart the job.
- <PID_OF_MPIRUN>
- Process ID of the mpirun process.
- -h | --help
- Display help for this command
- -w | --nowait
- Do not wait for the application to finish checkpointing
before returning.
- -s | --status
- Display status messages regarding the progression of the
checkpoint request.
- --term
- After checkpointing the running job, terminate it.
- -v | --verbose
- Enable verbose output for debugging.
- -gmca | --gmca <key> <value>
- Pass global MCA parameters that are applicable to all
contexts. <key> is the parameter name; <value>
is the parameter value.
- -mca | --mca <key> <value>
- Send arguments to various MCA modules.
DESCRIPTION¶
orte-checkpoint can be invoked multiple, non-overlapping times. It is
convenient to note that the user does not need to spectify the checkpointer to
be used here, as that is determined completely by each of the running process
in the job being checkpointed.
SEE ALSO¶
orte-ps(1),
orte-clean(1),
ompi-restart(1),
opal-checkpoint(1),
opal-restart(1),
opal_crs(7)