OPENSM(8) | OpenIB Management | OPENSM(8) |
NAME¶
opensm - InfiniBand subnet manager and administration (SM/SA)SYNOPSIS¶
opensm [--version]] [-F | --config <file_name>] [-c(reate-config) <file_name>] [-g(uid) <GUID in hex>] [-l(mc) <LMC>] [-p(riority) <PRIORITY>] [-smkey <SM_Key>] [-r(eassign_lids)] [-R <engine name(s)> | --routing_engine <engine name(s)>] [-A | --ucast_cache] [-z | --connect_roots] [-M <file name> | --lid_matrix_file <file name>] [-U <file name> | --lfts_file <file name>] [-S | --sadb_file <file name>] [-a | --root_guid_file <path to file>] [-u | --cn_guid_file <path to file>] [-X | --guid_routing_order_file <path to file>] [-m | --ids_guid_file <path to file>] [-o(nce)] [-s(weep) <interval>] [-t(imeout) <milliseconds>] [-maxsmps <number>] [-console [off | local | socket | loopback]] [-console-port <port>] [-i(gnore-guids) <equalize-ignore-guids-file>] [-f <log file path> | --log_file <log file path> ] [-L | --log_limit <size in MB>] [-e(rase_log_file)] [-P(config) <partition config file> ] [-N | --no_part_enforce] [-Q | --qos [-Y | --qos_policy_file <file name>]] [-y | --stay_on_fatal] [-B | --daemon] [-I | --inactive] [--perfmgr] [--perfmgr_sweep_time_s <seconds>] [--prefix_routes_file <path>] [--consolidate_ipv6_snm_req] [-v(erbose)] [-V] [-D <flags>] [-d(ebug) <number>] [-h(elp)] [-?]DESCRIPTION¶
opensm is an InfiniBand compliant Subnet Manager and Administration, and runs on top of OpenIB.OPTIONS¶
- --version
- Prints OpenSM version and exits.
- -F, --config <config file>
- The name of the OpenSM config file. When not specified /etc/opensm/opensm.conf will be used (if exists).
- -c, --create-config <file name>
- OpenSM will dump its configuration to the specified file and exit. This is a way to generate OpenSM configuration file template.
- -g, --guid <GUID in hex>
- This option specifies the local port GUID value with which OpenSM should bind. OpenSM may be bound to 1 port at a time. If GUID given is 0, OpenSM displays a list of possible port GUIDs and waits for user input. Without -g, OpenSM tries to use the default port.
- -l, --lmc <LMC value>
- This option specifies the subnet's LMC value. The number of LIDs assigned to each port is 2^LMC. The LMC value must be in the range 0-7. LMC values > 0 allow multiple paths between ports. LMC values > 0 should only be used if the subnet topology actually provides multiple paths between ports, i.e. multiple interconnects between switches. Without -l, OpenSM defaults to LMC = 0, which allows one path between any two ports.
- -p, --priority <Priority value>
- This option specifies the SM´s PRIORITY. This will effect the handover cases, where master is chosen by priority and GUID. Range goes from 0 (default and lowest priority) to 15 (highest).
- -smkey <SM_Key value>
- This option specifies the SM´s SM_Key (64 bits). This will effect SM authentication. Note that OpenSM version 3.2.1 and below used the default value '1' in a host byte order, it is fixed now but you may need this option to interoperate with old OpenSM running on a little endian machine.
- -r, --reassign_lids
- This option causes OpenSM to reassign LIDs to all end nodes. Specifying -r on a running subnet may disrupt subnet traffic. Without -r, OpenSM attempts to preserve existing LID assignments resolving multiple use of same LID.
- -R, --routing_engine <Routing engine names>
- This option chooses routing engine(s) to use instead of Min Hop algorithm (default). Multiple routing engines can be specified separated by commas so that specific ordering of routing algorithms will be tried if earlier routing engines fail. Supported engines: minhop, updn, file, ftree, lash, dor
- -A, --ucast_cache
- This option enables unicast routing cache and prevents routing recalculation (which is a heavy task in a large cluster) when there was no topology change detected during the heavy sweep, or when the topology change does not require new routing calculation, e.g. when one or more CAs/RTRs/leaf switches going down, or one or more of these nodes coming back after being down. A very common case that is handled by the unicast routing cache is host reboot, which otherwise would cause two full routing recalculations: one when the host goes down, and the other when the host comes back online.
- -z, --connect_roots
- This option enforces a routing engine (currently up/down only) to make connectivity between root switches and in this way to be fully IBA complaint. In many cases this can violate "pure" deadlock free algorithm, so use it carefully.
- -M, --lid_matrix_file <file name>
- This option specifies the name of the lid matrix dump file from where switch lid matrices (min hops tables will be loaded.
- -U, --lfts_file <file name>
- This option specifies the name of the LFTs file from where switch forwarding tables will be loaded.
- -S, --sadb_file <file name>
- This option specifies the name of the SA DB dump file from where SA database will be loaded.
- -a, --root_guid_file <file name>
- Set the root nodes for the Up/Down or Fat-Tree routing algorithm to the guids provided in the given file (one to a line).
- -u, --cn_guid_file <file name>
- Set the compute nodes for the Fat-Tree routing algorithm to the guids provided in the given file (one to a line).
- -m, --ids_guid_file <file name>
- Name of the map file with set of the IDs which will be used by Up/Down routing algorithm instead of node GUIDs (format: <guid> <id> per line).
- -X, --guid_routing_order_file <file name>
- Set the order port guids will be routed for the MinHop and Up/Down routing algorithms to the guids provided in the given file (one to a line).
- -o, --once
- This option causes OpenSM to configure the subnet once, then exit. Ports remain in the ACTIVE state.
- -s, --sweep <interval value>
- This option specifies the number of seconds between subnet sweeps. Specifying -s 0 disables sweeping. Without -s, OpenSM defaults to a sweep interval of 10 seconds.
- -t, --timeout <value>
- This option specifies the time in milliseconds used for transaction timeouts. Specifying -t 0 disables timeouts. Without -t, OpenSM defaults to a timeout value of 200 milliseconds.
- -maxsmps <number>
- This option specifies the number of VL15 SMP MADs allowed on the wire at any one time. Specifying -maxsmps 0 allows unlimited outstanding SMPs. Without -maxsmps, OpenSM defaults to a maximum of 4 outstanding SMPs.
- -console [off | local | socket | loopback]
- This option brings up the OpenSM console (default off). Note that the socket and loopback options will only be available if OpenSM was built with --enable-console-socket.
- -console-port <port>
- Specify an alternate telnet port for the socket console (default 10000). Note that this option only appears if OpenSM was built with --enable-console-socket.
- -i, -ignore-guids <equalize-ignore-guids-file>
- This option provides the means to define a set of ports (by node guid and port number) that will be ignored by the link load equalization algorithm.
- -x, --honor_guid2lid
- This option forces OpenSM to honor the guid2lid file, when it comes out of Standby state, if such file exists under OSM_CACHE_DIR, and is valid. By default, this is FALSE.
- -f, --log_file <file name>
- This option defines the log to be the given file. By default, the log goes to /var/log/opensm.log. For the log to go to standard output use -f stdout.
- -L, --log_limit <size in MB>
- This option defines maximal log file size in MB. When specified the log file will be truncated upon reaching this limit.
- -e, --erase_log_file
- This option will cause deletion of the log file (if it previously exists). By default, the log file is accumulative.
- -P, --Pconfig <partition config file>
- This option defines the optional partition configuration file. The default name is /etc/opensm/partitions.conf.
- --prefix_routes_file <file name>
- Prefix routes control how the SA responds to path record queries for off-subnet DGIDs. By default, the SA fails such queries. The PREFIX ROUTES section below describes the format of the configuration file. The default path is /etc/opensm/prefix-routes.conf.
- -Q, --qos
- This option enables QoS setup. It is disabled by default.
- -Y, --qos_policy_file <file name>
- This option defines the optional QoS policy file. The default name is /etc/opensm/qos-policy.conf.
- -N, --no_part_enforce
- This option disables partition enforcement on switch external ports.
- -y, --stay_on_fatal
- This option will cause SM not to exit on fatal initialization issues: if SM discovers duplicated guids or a 12x link with lane reversal badly configured. By default, the SM will exit on these errors.
- -B, --daemon
- Run in daemon mode - OpenSM will run in the background.
- -I, --inactive
- Start SM in inactive rather than init SM state. This option can be used in conjunction with the perfmgr so as to run a standalone performance manager without SM/SA. However, this is NOT currently implemented in the performance manager.
- -perfmgr
- Enable the perfmgr. Only takes effect if --enable-perfmgr was specified at configure time.
- -perfmgr_sweep_time_s <seconds>
- Specify the sweep time for the performance manager in seconds (default is 180 seconds). Only takes effect if --enable-perfmgr was specified at configure time.
- --consolidate_ipv6_snm_req
- Consolidate IPv6 Solicited Node Multicast group join requests into one multicast group per MGID PKey.
- -v, --verbose
- This option increases the log verbosity level. The -v option may be specified multiple times to further increase the verbosity level. See the -D option for more information about log verbosity.
- -V
- This option sets the maximum verbosity level and forces log flushing. The -V option is equivalent to ´-D 0xFF -d 2´. See the -D option for more information about log verbosity.
- -D <value>
- This option sets the log verbosity level. A flags field
must follow the -D option. A bit set/clear in the flags enables/disables a
specific log level as follows:
BIT LOG LEVEL ENABLED
---- -----------------
0x01 - ERROR (error messages)
0x02 - INFO (basic messages, low volume)
0x04 - VERBOSE (interesting stuff, moderate volume)
0x08 - DEBUG (diagnostic, high volume)
0x10 - FUNCS (function entry/exit, very high volume)
0x20 - FRAMES (dumps all SMP and GMP frames)
0x40 - ROUTING (dump FDB routing information)
0x80 - currently unused.
- -d, --debug <value>
- This option specifies a debug option. These options are not
normally needed. The number following -d selects the debug option to
enable as follows:
OPT Description
--- -----------------
-d0 - Ignore other SM nodes
-d1 - Force single threaded dispatching
-d2 - Force log flushing after each log message
-d3 - Disable multicast support
- -h, --help
- Display this usage info then exit.
- -?
- Display this usage info then exit.
ENVIRONMENT VARIABLES¶
The following environment variables control opensm behavior:guid2lid - stores the LID range assigned to each GUID
NOTES¶
When opensm receives a HUP signal, it starts a new heavy sweep as if a trap was received or a topology change was found. Also, SIGUSR1 can be used to trigger a reopen of /var/log/opensm.log for logrotate purposes.PARTITION CONFIGURATION¶
The default name of OpenSM partitions configuration file is /etc/opensm/partitions.conf. The default may be changed by using --Pconfig (-P) option with OpenSM.PartitionName - string, will be used with logging. When omitted
empty string will be used.
PKey - P_Key value for this partition. Only low 15 bits will
be used. When omitted will be autogenerated.
flag - used to indicate IPoIB capability of this partition.
defmember=full|limited - specifies default membership for port guid
list. Default is limited.
ipoib - indicates that this partition may be used for IPoIB, as
result IPoIB capable MC group will be created.
rate=<val> - specifies rate for this IPoIB MC group
(default is 3 (10GBps))
mtu=<val> - specifies MTU for this IPoIB MC group
(default is 4 (2048))
sl=<val> - specifies SL for this IPoIB MC group
(default is 0)
scope=<val> - specifies scope for this IPoIB MC group
(default is 2 (link local)). Multiple scope settings
are permitted for a partition.
PortGUID - GUID of partition member EndPort. Hexadecimal
numbers should start from 0x, decimal numbers
are accepted too.
full or limited - indicates full or limited membership for this
port. When omitted (or unrecognized) limited
membership is assumed.
- 'ALL' means all end ports in this subnet.
- 'SELF' means subnet manager's port.
Default=0x7fff : ALL, SELF=full ;
NewPartition , ipoib : 0x123456=full, 0x3456789034=limi, 0x2134af2306 ;
YetAnotherOne = 0x300 : SELF=full ;
YetAnotherOne = 0x300 : ALL=limited ;
ShareIO = 0x80 , defmember=full : 0x123451, 0x123452;
# 0x123453, 0x123454 will be limited
ShareIO = 0x80 : 0x123453, 0x123454, 0x123455=full;
# 0x123456, 0x123457 will be limited
ShareIO = 0x80 : defmember=limited : 0x123456, 0x123457, 0x123458=full;
ShareIO = 0x80 , defmember=full : 0x123459, 0x12345a;
ShareIO = 0x80 , defmember=full : 0x12345b, 0x12345c=limited, 0x12345d;
Default=0x7fff,ipoib:ALL=full;
QOS CONFIGURATION¶
There are a set of QoS related low-level configuration parameters. All these parameter names are prefixed by "qos_" string. Here is a full list of these parameters:qos_max_vls - The maximum number of VLs that will be on the subnet
qos_high_limit - The limit of High Priority component of VL
Arbitration table (IBA 7.6.9)
qos_vlarb_low - Low priority VL Arbitration table (IBA 7.6.9)
template
qos_vlarb_high - High priority VL Arbitration table (IBA 7.6.9)
template
Both VL arbitration templates are pairs of
VL and weight
qos_sl2vl - SL2VL Mapping table (IBA 7.6.6) template. It is
a list of VLs corresponding to SLs 0-15 (Note
that VL15 used here means drop this SL)
qos_max_vls 15
qos_high_limit 0
qos_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4
qos_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0
qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7
qos_ca_ - QoS configuration parameters set for CAs.
qos_rtr_ - parameters set for routers.
qos_sw0_ - parameters set for switches' port 0.
qos_swe_ - parameters set for switches' external ports.
qos_sw0_max_vls=2
qos_ca_sl2vl=0,1,2,3,5,5,5,12,12,0,
qos_swe_high_limit=0
PREFIX ROUTES¶
Prefix routes control how the SA responds to path record queries for off-subnet DGIDs. By default, the SA fails such queries. Note that IBA does not specify how the SA should obtain off-subnet path record information. The prefix routes configuration is meant as a stop-gap until the specification is completed. Each line in the configuration file is a 64-bit prefix followed by a 64-bit GUID, separated by white space. The GUID specifies the router port on the local subnet that will handle the prefix. Blank lines are ignored, as is anything between a # character and the end of the line. The prefix and GUID are both in hex, the leading 0x is optional. Either, or both, can be wild-carded by specifying an asterisk instead of an explicit prefix or GUID. When responding to a path record query for an off-subnet DGID, opensm searches for the first prefix match in the configuration file. Therefore, the order of the lines in the configuration file is important: a wild-carded prefix at the beginning of the configuration file renders all subsequent lines useless. If there is no match, then opensm fails the query. It is legal to repeat prefixes in the configuration file, opensm will return the path to the first available matching router. A configuration file with a single line where both prefix and GUID are wild-carded means that a path record query specifying any off-subnet DGID should return a path to the first available router. This configuration yields the same behaviour formerly achieved by compiling opensm with -DROUTER_EXP.ROUTING¶
OpenSM now offers five routing engines:How many hops are required to get from each port to each LID ?
The algorithm to fill these tables is different if you run standard (min hop) or Up/Down.
For standard routing, a "relaxation" algorithm is used to propagate min hop from every destination LID through neighbor switches
For Up/Down routing, a BFS from every target is used. The BFS tracks link direction (up or down) and avoid steps that will perform up after a down step was used.
This step is common to standard and Up/Down routing. Each port has a counter counting the number of target LIDs going through it.
When there are multiple alternative ports with same MinHop to a LID, the one with less previously assigned ports is selected.
If LMC > 0, more checks are added: Within each group of LIDs assigned to same target port,
a. use only ports which have same MinHop
b. first prefer the ones that go to different systemImageGuid (then the previous LID of the same LMC group)
c. if none - prefer those which go through another NodeGuid
d. fall back to the number of paths method (if all go to same node).
This option causes OpenSM to reassign LIDs to all
end nodes. Specifying -r on a running subnet
may disrupt subnet traffic.
Without -r, OpenSM attempts to preserve existing
LID assignments resolving multiple use of same LID.
This option provides the means to define a set of ports
(by guid) that will be ignored by the link load
equalization algorithm. Note that only endports (CA,
switch port 0, and router ports) and not switch external
ports are supported.
Note 1: The user can override the node list manually.
Note 2: If this stage cannot find any root nodes, and the user did
not specify a guid list file, OpenSM defaults back to the
Min Hop routing algorithm.
- Tree rank should be between two and eight (inclusively)
- Switches of the same rank should have the same number
of UP-going port groups*, unless they are root switches,
in which case the shouldn't have UP-going ports at all.
- Switches of the same rank should have the same number
of DOWN-going port groups, unless they are leaf switches.
- Switches of the same rank should have the same number
of ports in each UP-going port group.
- Switches of the same rank should have the same number
of ports in each DOWN-going port group.
- All the CAs have to be at the same tree level (rank).
- Tree rank should be between two and eight (inclusively)
- All the Compute Nodes** have to be at the same tree level (rank).
Note that non-compute node CAs are allowed here to be at different
tree ranks.
- this will load switch LFTs and/or LID matrices (min hops tables)
- this will load switch LFTs according to the path entries introduced
in the file
- no additional checks will be performed (such as "is port connected",
etc.)
- in case when fabric LIDs were changed this will try to reconstruct
LFTs correctly if endport GUIDs are represented in the file
(in order to disable this, GUIDs may be removed from the file
or zeroed)
opensm -R file -U /path/to/lfts_file
opensm -R file -M ./opensm-lid-matrix.dump
FILES¶
- /etc/opensm/opensm.conf
- default OpenSM config file.
- /etc/opensm/ib-node-name-map
- default node name map file. See ibnetdiscover for more
information on format.
- /etc/opensm/partitions.conf
- default partition config file
- /etc/opensm/qos-policy.conf
- default QOS policy config file
- /etc/opensm/prefix-routes.conf
- default prefix routes file.
AUTHORS¶
- Hal Rosenstock
- <hal.rosenstock@gmail.com>
- Sasha Khapyorsky
- <sashak@voltaire.com>
- Eitan Zahavi
- <eitan@mellanox.co.il>
- Yevgeny Kliteynik
- <kliteyn@mellanox.co.il>
- Thomas Sodring
- <tsodring@simula.no>
- Ira Weiny
- <weiny2@llnl.gov>
June 13, 2008 | OpenIB |