NAME¶
smap - graphically view information about SLURM jobs, partitions, and set
configurations parameters.
SYNOPSIS¶
smap [
OPTIONS...]
DESCRIPTION¶
smap is used to graphically view job, partition and node information for
a system running SLURM. Note that information about nodes and partitions to
which a user lacks access will always be displayed to avoid obvious gaps in
the output. This is equivalent to the
--all option of the
sinfo
and
squeue commands.
OPTIONS¶
- -c, --commandline
- Print output to the commandline, no curses.
- -D <option>,
--display=<option>
- sets the display mode for smap. Showing revelant
information about specific views and displaying a corresponding node
chart. While in any display a user can switch by typing a different view
letter. This is true in all modes except for 'configure mode' user can
type 'quit' to exit just configure mode. Typing 'exit' will end the
configuration mode and exit smap. Note that unallocated nodes are
indicated by a '.' and nodes in the DOWN, DRAINED or FAIL state by a
'#'.
- b
- Displays information about BlueGene partitions on the
system
- c
- Displays current BlueGene node states and allows users to
configure the system.
- j
- Displays information about jobs running on system.
- r
- Display information about advanced reservations. While all
current and future reservations will be listed, only currently active
reservations will appear on the node map.
- s
- Displays information about slurm partitions on the
system
- -h, --noheader
- Do not print a header on the output.
- -H, --show_hidden
- Display hidden partitions and their jobs.
- --help,
- Print a message describing all smap options.
- -i <seconds> ,
--iterate=<seconds>
- Print the state on a periodic basis. Sleep for the
indicated number of seconds between reports. User can exit at anytime by
typing 'q' or hitting the return key. If user is in configure mode type
'exit' to exit program, 'quit' to exit configure mode.
- -I, --ionodes
- Only show objects with these ionodes this support is only
for bluegene systems. This should be used inconjuction with the '-n'
option. Only specify the ionode number range here. Specify the node name
with the '-n' option.
- -M, --clusters=<string>
- Clusters to issue commands to.
- -n, --nodes
- Only show objects with these nodes. If querying to the
ionode level use the option '-I' in conjunction with this option.
- -Q, --quiet
- Avoid printing error messages.
- -R <RACK_MIDPLANE_ID/XYZ>,
--resolve=<RACK_MIDPLANE_ID/XYZ>
- Returns the XYZ coords for a Rack/Midplane id or
vice-versa.
To get the XYZ coord for a Rack/Midplane id input -R R101 where 10 is the
rack and 1 is the midplane.
To get the Rack/Midplane id from a XYZ coord input -R 101 where X=1 Y=1 Z=1
with no leading 'R'.
- --usage
- Print a brief message listing the smap options.
- -V , --version
- Print version information and exit.
INTERACTIVE OPTIONS¶
When using smap in curses mode you can scroll through the different windows
using the arrow keys. The
up and
down arrow keys scroll the
window containing the grid, and the
left and
right arrow keys
scroll the window containing the text information.
To change screens when an iterate is set you can use any of the options
available to the
-D option listed above.
You can also hide or make visible hidden partitions by pressing
h at any
moment when an iterate is set.
OUTPUT FIELD DESCRIPTIONS¶
- ACCESS_CONTROL
- Identifies the users or bank accounts which can use this
advanced reservation. A prefix of "A:" indicates that the
following account names may use this reservation. A prefix of
"U:" indicates that the following user names may use this
reservation.
- AVAIL
- Partition state: up or down.
- BG_BLOCK
- BlueGene Block Name.
- CONN
- Connection Type: TORUS or MESH or
SMALL (for small blocks).
- END_TIME
- The time when an advanced reservation ended.
- ID
- Key to identify the nodes associated with this entity in
the node chart.
- MODE
- Mode Type: COPROCESS or VIRTUAL.
- NAME
- Name of the job or advanced reservation.
- NODELIST or BP_LIST
- Names of nodes or base partitions associated with this
configuration, partition or reservation.
- NODES
- Count of nodes or base partitions with this particular
configuration.
- PARTITION
- Name of a partition. Note that the suffix "*"
identifies the default partition.
- ST
- State of a job in compact form. Possible states include: PD
(pending), R (running), S (suspended), CD (completed), CF (configuring),
CG (completing), F (failed), TO (timeout), and NF (node failure). See
JOB STATE CODES section below for more information.
- START_TIME
- The time when an advanced reservation started.
- STATE
- State of the nodes. Possible states include: allocated,
completing, down, drained, draining, fail, failing, idle, and unknown plus
their abbreviated forms: alloc, comp, donw, drain, drng, fail, failg,
idle, and unk respectively. Note that the suffix "*" identifies
nodes that are presently not responding. See NODE STATE CODES
section below for more information.
- TIMELIMIT
- Maximum time limit for any user job in
days-hours:minutes:seconds. infinite is used to identify jobs or
partitions without a job time limit.
The node chart is designed to indicate relative locations of the nodes. On most
Linux clusters this will represent a one-dimensional array of nodes. Larger
clusters will utilize multiple as needed with right side of one line being
logically followed by the left side of the next line.
On BlueGene systems, the node chart will indicate the three
dimensional topography of the system.
The X dimension will increase from left to right on a given line.
The Y dimension will increase in planes from bottom to top.
The Z dimension will increase within a plane from the back
line to the front line of a plane.
Note the example below:
a a a a b b d d
a a a a b b d d
a a a a b b c c
a a a a b b c c
a a a a b b d d
a a a a b b d d
a a a a b b c c
a a a a b b c c
a a a a . . d d
a a a a . . d d
a a a a . . e e Y
a a a a . . e e |
|
a a a a . . d d 0----X
a a a a . . d d /
a a a a . . . . /
a a a a . . . # Z
ID JOBID PARTITION BG_BLOCK USER NAME ST TIME NODES BP_LIST
a 12345 batch RMP0 joseph tst1 R 43:12 32k bgl[000x333]
b 12346 debug RMP1 chris sim3 R 12:34 8k bgl[420x533]
c 12350 debug RMP2 danny job3 R 0:12 4k bgl[622x733]
d 12356 debug RMP3 dan colu R 18:05 8k bgl[600x731]
e 12378 debug RMP4 joseph asx4 R 0:34 2k bgl[612x713]
CONFIGURATION INSTRUCTIONS¶
For Admin use. From this screen one can create a configuration file that is used
to partition and wire the system into usable blocks.
- OUTPUT
-
- BG_BLOCK
- BlueGene Block Name.
- CONN
- Connection Type: TORUS or MESH or
SMALL (for small blocks).
- ID
- Key to identify the nodes associated with this entity in
the node chart.
- MODE
- Mode Type: COPROCESS or VIRTUAL.
- INPUT COMMANDS
- resolve <RACK_MIDPLANE_ID/XYZ>
- Returns the XYZ coords for a Rack/Midplane id or
vice-versa.
To get the XYZ coord for a Rack/Midplane id input -R R101 where 10 is the
rack and 1 is the midplane.
To get the Rack/Midplane id from a XYZ coord input -R 101 where X=1 Y=1 Z=1
with no leading 'R'.
- load <bluegene.conf file>
- Load an already exsistant bluegene.conf file. This will
varify and mapout a bluegene.conf file. After loaded the configuration may
be edited and saved as a new file.
- create <size> <options>
- Submit request for partition creation. The size may be
specified either as a count of base partitions or specific dimensions in
the X, Y and Z directions separated by "x", for example
"2x3x4". A variety of options may be specified. Valid options
are listed below. Note that the option and their values are case
insensitive (e.g. "MESH" and "mesh" are
equivalent).
- Start = XxYxZ
- Identify where to start the partition. This is primarily
for testing purposes. For convenience one can only put the X coord or XxY
will also work. The default value is 0x0x0.
- Connection = MESH | TORUS | SMALL
- Identify how the nodes should be connected in network. The
default value is TORUS.
- Small
- Equivalent to "Connection=Small". If a small
connection is specified the base partition chosen will create smaller
partitions based on options 32CNBlocks and 128CNBlocks
respectively for a Bluegene L system. 16CNBlocks,
64CNBlocks, and 256CNBlocks are also available for Bluegene
P systems. Keep in mind you must have enough ionodes to make all these
configurations possible.
These number will be altered to take up the entire base partition. Size
does not need to be specified with a small request, we will always default
to 1 base partition for allocation.
- Mesh
- Equivalent to "Connection=Mesh".
- Torus
- Equivalent to "Connection=Torus".
- Rotation = TRUE | FALSE
- Specifies that the geometry specified in the size parameter
may be rotated in space (e.g. the Y and Z dimensions may be switched). The
default value is FALSE.
- Rotate
- Equivalent to "Rotation=true".
- Elongation = TRUE | FALSE
- If TRUE, permit the geometry specified in the size
parameter to be altered as needed to fit available resources. For example,
an allocation of "4x2x1" might be used to satisfy a size
specification of "2x2x2". The default value is FALSE.
- Elongate
- Equivalent to "Elongation=true".
- copy <id> <count>
- Submit request for partition to be copied. You may copy a
specific partition by specifying its id, by default the last configured
partition is copied. You may also specify a number of copies to be made.
By default, one copy is made.
- delete <id>
- Delete the specified block.
- down <node_range>
- Down a specific node or range of nodes. i.e. 000, 000-111
[000x111]
- up <node_range>
- Bring a specific node or range of nodes up. i.e. 000,
000-111 [000x111]
- alldown
- Set all nodes to down state.
- allup
- Set all nodes to up state.
- save <file_name>
- Save the current configuration to a file. If no file_name
is specified, the configuration is written to a file named
"bluegene.conf" in the current working directory.
- clear
- Clear all partitions created.
NODE STATE CODES¶
Node state codes are shortened as required for the field size. If the node state
code is followed by "*", this indicates the node is presently not
responding and will not be allocated any new work. If the node remains
non-responsive, it will be placed in the
DOWN state (except in the case
of
COMPLETING,
DRAINED,
DRAINING,
FAIL,
FAILING nodes).
If the node state code is followed by "~", this indicates the node is
presently in a power saving mode (typically running at reduced frequency). If
the node state code is followed by "#", this indicates the node is
presently being powered up or configured.
- ALLOCATED
- The node has been allocated to one or more jobs.
- ALLOCATED+
- The node is allocated to one or more active jobs plus one
or more jobs are in the process of COMPLETING.
- COMPLETING
- All jobs associated with this node are in the process of
COMPLETING. This node state will be removed when all of the job's
processes have terminated and the SLURM epilog program (if any) has
terminated. See the Epilog parameter description in the
slurm.conf man page for more information.
- DOWN
- The node is unavailable for use. SLURM can automatically
place nodes in this state if some failure occurs. System administrators
may also explicitly place nodes in this state. If a node resumes normal
operation, SLURM can automatically return it to service. See the
ReturnToService and SlurmdTimeout parameter descriptions in
the slurm.conf(5) man page for more information.
- DRAINED
- The node is unavailable for use per system administrator
request. See the update node command in the scontrol(1) man
page or the slurm.conf(5) man page for more information.
- DRAINING
- The node is currently executing a job, but will not be
allocated to additional jobs. The node state will be changed to state
DRAINED when the last job on it completes. Nodes enter this state
per system administrator request. See the update node
command in the scontrol(1) man page or the slurm.conf(5) man
page for more information.
- FAIL
- The node is expected to fail soon and is unavailable for
use per system administrator request. See the update node command
in the scontrol(1) man page or the slurm.conf(5) man page
for more information.
- FAILING
- The node is currently executing a job, but is expected to
fail soon and is unavailable for use per system administrator request. See
the update node command in the scontrol(1) man page or the
slurm.conf(5) man page for more information.
- IDLE
- The node is not allocated to any jobs and is available for
use.
- MAINT
- The node is currently in a reservation with a flag value of
"maintainence".
- UNKNOWN
- The SLURM controller has just started and the node's state
has not yet been determined.
JOB STATE CODES¶
Jobs typically pass through several states in the course of their execution. The
typical states are
PENDING,
RUNNING,
SUSPENDED,
COMPLETING, and
COMPLETED. An explanation of each state follows.
- CA CANCELLED
- Job was explicitly cancelled by the user or system
administrator. The job may or may not have been initiated.
- CD COMPLETED
- Job has terminated all processes on all nodes.
- CG COMPLETING
- Job is in the process of completing. Some processes on some
nodes may still be active.
- CF CONFIGURING
- Job has been allocated resources, but are waiting for them
to become ready for use (e.g. booting).
- F FAILED
- Job terminated with non-zero exit code or other failure
condition.
- NF NODE_FAIL
- Job terminated due to failure of one or more allocated
nodes.
- PD PENDING
- Job is awaiting resource allocation.
- PR PREEMPTED
- Job terminated due to preemption.
- R RUNNING
- Job currently has an allocation.
- S SUSPENDED
- Job has an allocation, but execution has been
suspended.
- TO TIMEOUT
- Job terminated upon reaching its time limit.
ENVIRONMENT VARIABLES¶
The following environment variables can be used to override settings compiled
into smap.
- SLURM_CONF
- The location of the SLURM configuration file.
COPYING¶
Copyright (C) 2004-2007 The Regents of the University of California. Copyright
(C) 2008-2009 Lawrence Livermore National Security. Produced at Lawrence
Livermore National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All rights
reserved.
This file is part of SLURM, a resource management program. For details, see
<
http://www.schedmd.com/slurmdocs/>.
SLURM is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
SEE ALSO¶
scontrol(1),
sinfo(1),
squeue(1),
slurm_load_ctl_conf (3),
slurm_load_jobs (3),
slurm_load_node (3),
slurm_load_partitions (3),
slurm_reconfigure (3),
slurm_shutdown (3),
slurm_update_job (3),
slurm_update_node (3),
slurm_update_partition (3),
slurm.conf(5)