table of contents
UPSMON.CONF(5) | NUT Manual | UPSMON.CONF(5) |
NAME¶
upsmon.conf - Configuration for Network UPS Tools upsmon
DESCRIPTION¶
This file’s primary job is to define the systems that upsmon(8) will monitor and to tell it how to shut down the system when necessary. It will contain passwords, so keep it secure. Ideally, only the upsmon process should be able to read it.
Additionally, other optional configuration values can be set in this file.
CONFIGURATION DIRECTIVES¶
DEADTIME seconds
upsmon requires a UPS to provide status information every few seconds (see POLLFREQ and POLLFREQALERT) to keep things updated. If the status fetch fails, the UPS is marked stale. If it stays stale for more than DEADTIME seconds, the UPS is marked dead.
A dead UPS that was last known to be on battery is assumed to have changed to a low battery condition. This may force a shutdown if it is providing a critical amount of power to your system. This seems disruptive, but the alternative is barreling ahead into oblivion and crashing when you run out of power.
Note: DEADTIME should be a multiple of POLLFREQ and POLLFREQALERT. Otherwise, you’ll have "dead" UPSes simply because upsmon isn’t polling them quickly enough. Rule of thumb: take the larger of the two POLLFREQ values, and multiply by 3.
FINALDELAY seconds
If you need to let your users do something in between those events, increase this number. Remember, at this point your UPS battery is almost depleted, so don’t make this too big.
Alternatively, you can set this very low so you don’t wait around when it’s time to shut down. Some UPSes don’t give much warning for low battery and will require a value of 0 here for a safe shutdown.
Note
If FINALDELAY on the secondary is greater than HOSTSYNC on the primary, the primary will give up waiting for that secondary upsmon to disconnect.
HOSTSYNC seconds
When a UPS goes critical (on battery + low battery, or "FSD": forced shutdown), the secondary systems are supposed to disconnect and shut down right away. The HOSTSYNC timer keeps the primary upsmon from sitting there forever if one of the secondaries gets stuck.
This value is also used to keep secondary systems from getting stuck if the primary fails to respond in time. After a UPS becomes critical, the secondary will wait up to HOSTSYNC seconds for the primary to set the FSD flag. If that timer expires, the secondary upsmon will assume that the primary (or communications path to it) is broken and will shut down anyway.
This keeps the secondaries from shutting down during a short-lived status change to "OB LB" and back that the secondaries see but the primary misses.
MINSUPPLIES num
Large/expensive server type systems usually have more, and can run with a few missing. The HP NetServer LH4 can run with 2 out of 4, for example, so you’d set it to 2. The idea is to keep the box running as long as possible, right?
Obviously you have to put the redundant supplies on different UPS circuits for this to make sense! See big-servers.txt in the docs subdirectory for more information and ideas on how to use this feature.
Also see the section on "power values" in upsmon(8).
MONITOR system powervalue username password type
You must have at least one MONITOR directive in upsmon.conf.
system is a UPS identifier. It is in this form:
<upsname>[@<hostname>[:<port>]]
The default hostname is "localhost". Some examples:
powervalue is an integer representing the number of power supplies that the UPS feeds on this system. Most normal computers have one power supply, and the UPS feeds it, so this value will be 1. You need a very large or special system to have anything higher here.
You can set the powervalue to 0 if you want to monitor a UPS that doesn’t actually supply power to this system. This is useful when you want to have upsmon do notifications about status changes on a UPS without shutting down when it goes critical.
The username and password on this line must match an entry in the upsd server system’s upsd.users(5) file.
If your username is "observer" and your password is "abcd", the MONITOR line might look like this (likely on a remote secondary system):
MONITOR myups@bigserver 1 observer abcd secondary
Meanwhile, the upsd.users on bigserver would look like this:
[observer]
password = abcd
upsmon secondary
[upswired]
password = blah
upsmon primary
And the copy of upsmon on that bigserver would run with the primary configuration:
MONITOR myups@bigserver 1 upswired blah primary
The type refers to the relationship with upsd(8). It can be either "primary" or "secondary". See upsmon(8) for more information on the meaning of these modes. The mode you pick here also goes in the upsd.users file, as seen in the example above.
NOCOMMWARNTIME seconds
POLLFAIL_LOG_THROTTLE_MAX count
A negative value means standard behavior, and a zero value means to never repeat the message (log only on start and end/change of the failure state).
NOTIFYCMD command
This command is called with the full text of the message as one argument. The environment string NOTIFYTYPE will contain the type string of whatever caused this event to happen.
If you need to use upssched(8), then you must make it your NOTIFYCMD by listing it here.
Note that this is only called for NOTIFY events that have EXEC set with NOTIFYFLAG. See NOTIFYFLAG below for more details.
Making this some sort of shell script might not be a bad idea. For more information and ideas, see docs/scheduling.txt
Remember, this command also needs to be one element in the configuration file, so if your command has spaces, then wrap it in quotes.
NOTIFYCMD "/path/to/script --foo --bar"
This script is run in the background—that is, upsmon forks before it calls out to start it. This means that your NOTIFYCMD may have multiple instances running simultaneously if a lot of stuff happens all at once. Keep this in mind when designing complicated notifiers.
NOTIFYMSG type message
NOTIFYMSG ONLINE "UPS %s is getting line power"
NOTIFYMSG ONBATT "Someone pulled the plug on %s"
Note that %s is replaced with the identifier of the UPS in question.
The message must be one element in the configuration file, so if it contains spaces, you must wrap it in quotes.
NOTIFYMSG NOCOMM "Someone stole UPS %s"
Possible values for type:
ONLINE
ONBATT
LOWBATT
FSD
COMMOK
COMMBAD
SHUTDOWN
REPLBATT
NOCOMM
NOPARENT
CAL
OFF
NOTOFF
BYPASS
NOTBYPASS
NOTIFYFLAG type flag[+flag]...
Examples:
NOTIFYFLAG ONLINE SYSLOG NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
Possible values for the flags:
SYSLOG
WALL
EXEC
IGNORE
If you use IGNORE, don’t use any other flags on the same line.
POLLFREQ seconds
There are some catches. First, if you set the POLLFREQ too high, you may miss short-lived power events entirely. You also risk triggering the DEADTIME (see above) if you use a very large number.
Second, there is a point of diminishing returns if you set it too low. While upsd normally has all of the data available to it instantly, most drivers only refresh the UPS status once every 2 seconds. Polling any more than that usually doesn’t get you the information any faster.
POLLFREQALERT seconds
This should always be equal to or lower than the POLLFREQ value. By default it is also set 5 seconds.
The warnings from the POLLFREQ entry about too-high and too-low values also apply here.
POWERDOWNFLAG filename
This is done to forcibly reset the secondary systems, so they don’t get stuck at the "halted" stage even if the power returns during the shutdown process. This usually does not work well on contact-closure UPSes that use the genericups driver.
See the config-notes.txt file in the docs subdirectory for more information. Refer to the section:
"Configuring automatic shutdowns for low battery events", or refer to the online version.
OFFDURATION seconds
A negative value means to disable decreasing the counter of working power supplies in such cases, and a zero makes the effect of detected "OFF" state immediate. Built-in default value is 30 (seconds), to put an "OFF" state into effect (decrease known-fed supplies count) if it persists for this many seconds.
Note
so far we support the device reporting an "OFF" state which usually means completely un-powering the load; a bug-tracker issue was logged to design similar support for just some manageable outlets or outlet groups.
RBWARNTIME seconds
If you need another value, set it here.
RUN_AS_USER username
The catch is that "nobody" can’t read your upsmon.conf, since by default it is installed so that only root can open it. This means you won’t be able to reload the configuration file, since it will be unavailable.
The solution is to create a new user just for upsmon, then make it run as that user. I suggest "nutmon", but you can use anything that isn’t already taken on your system. Just create a regular user with no special privileges and an impossible password.
Then, tell upsmon to run as that user, and make upsmon.conf readable by it. Your reloads will work, and your config file will stay secure.
This file should not be writable by the upsmon user, as it would be possible to exploit a hole, change the SHUTDOWNCMD to something malicious, then wait for upsmon to be restarted.
SHUTDOWNCMD command
When upsmon is a primary, it will allow any secondaries to log out before starting the local shutdown procedure.
Note that the command needs to be one element in the config file. If your shutdown command includes spaces, then put it in quotes to keep it together, i.e.:
SHUTDOWNCMD "/sbin/shutdown -h +0"
SHUTDOWNEXIT boolean|number
Some "secondary" systems with workloads that take considerable time to stop (e.g. virtual machines or large databases) can benefit from reporting (by virtue of logging off the data server) that they are ready for the "primary" system to begin its own shutdown and eventually to tell the UPS to cut the power - not as soon as they have triggered their own shutdown, but at a later point (e.g. when the upsmon service is stopped AFTER the heavier workloads).
Note that the actual ability to complete such shutdown depends on the remaining battery run-time at the moment when UPS power state becomes considered critical and the shutdowns begin. You may also have to tune HOSTSYNC on the NUT primary to be long enough for those secondaries to stop their services. In practice, it may be worthwhile to investigate ways to trigger shutdowns earlier on these systems, e.g. by setting up upssched integration, or dummy-ups driver with overrides for stricter battery.charge or battery.runtime triggers than used by the rest of your servers.
This option supports Boolean-style strings (yes/on/true or no/off/false) or numbers to define a delay (in seconds) between calling SHUTDOWNCMD and exiting the daemon. Zero means immediate exit (default), negative values mean never exiting on its own accord.
CERTPATH certificate file or database
With NSS:
With OpenSSL:
CERTIDENT certificate name database password
CERTHOST hostname certificate name certverify forcessl
Each entry maps server name with the expected certificate name and flags indicating if the server certificate is verified and if the connection must be secure.
CERTVERIFY 0 | 1
Without this, there is no guarantee that the upsd is the right host. Enabling this greatly reduces the risk of man-in-the-middle attacks. This effectively forces the use of SSL, so don’t use this unless all of your upsd hosts are ready for SSL and have their certificates in order.
When compiled with NSS support of SSL, can be overridden for host specified with a CERTHOST directive.
FORCESSL 0 | 1
If you don’t use CERTVERIFY 1, then this will at least make sure that nobody can sniff your sessions without a large effort. Setting this will make upsmon drop connections if the remote upsd doesn’t support SSL, so don’t use it unless all of them have it running.
When compiled with NSS support of SSL, can be overridden for host specified with a CERTHOST directive.
DEBUG_MIN INTEGER
Note
if the running daemon receives a reload command, presence of the DEBUG_MIN NUMBER value in the configuration file can be used to tune debugging verbosity in the running service daemon (it is recommended to comment it away or set the minimum to explicit zero when done, to avoid huge journals and I/O system abuse). Keep in mind that for this run-time tuning, the DEBUG_MIN value present in reloaded configuration files is applied instantly and overrides any previously set value, from file or CLI options, regardless of older logging level being higher or lower than the newly found number; a missing (or commented away) value however does not change the previously active logging verbosity.
SEE ALSO¶
upsmon(8), upsd(8), nutupsdrv(8).
Internet resources:¶
The NUT (Network UPS Tools) home page: https://www.networkupstools.org/
09/10/2024 | Network UPS Tools 2.8.1 |