NAME¶
lamshrink - Shrink a LAM universe.
SYNOPSIS¶
lamshrink [-dhv] [-w delay] nodeid
OPTIONS¶
- -d
- Print detailed debugging information.
- -h
- Print useful information on this command.
- -v
- Be verbose.
- -w delay
- Notify processes on the doomed node and pause for delay seconds before
proceeding.
- nodeid
- Remove the LAM node with this ID.
DESCRIPTION¶
An existing LAM session, initiated by
lamboot(1), can be shrunk to include less
nodes with
lamshrink. One node is removed for each invocation. At a
minimum, the node ID is given on the command line. Once
lamshrink
completes, the node ID is invalid across the remaining nodes (as can be seen
by running
lamnodes(1)).
Existing application processes on the target node can be warned of impending
shutdown with the -w option. A LAM signal (SIGFUSE) will be sent to these
processes and
lamshrink will then pause for the given number of seconds
before proceeding with removing the node. By default, SIGFUSE is ignored. A
different handler can be installed with ksignal(2).
All application processes on all remaining nodes are always informed of the
death of a node. This is also done with a signal (SIGSHRINK), which by default
causes a process's runtime route cache to be flushed (to remove any cached
information on the dead node). If this signal is re-vectored for the purpose
of fault tolerance, the old handler should be called at the beginning of the
new handler. The signal does not, by itself, give the process information on
which node has been removed. One technique for getting this information is to
query the router for information on all relevant nodes using getroute(2). The
dead node will cause this routine to return an error.
FAULT TOLERANCE¶
If enabled with
lamboot(1), LAM will watch for nodes that fail. The procedure
for removing a node that has failed is the same as
lamshrink after the
warning step. In particular, the SIGSHRINK signal is delivered.
EXAMPLES¶
- lamshrink -v n1 Remove LAM on n1. Report about important steps as
- they are done.
- lamshrink n30 -w 10
- Inform all processes on LAM node 30, that the node will be dead in 10
seconds. Wait 10 seconds and remove the node. Operate silently.
SEE ALSO¶
lamboot(1),
lamnodes(1), ksignal(2), getroute(2)