NAME¶
pbs_sched_tcl - pbs Tcl scheduler
SYNOPSIS¶
pbs_sched [-a alarm] [-b file] [-d home] [-i file]
[-L logfile] [-p file] [-S port] [-t file] [-v]
[-c file]
DESCRIPTION¶
The
pbs_sched program runs in conjunction with the PBS server. It queries
the server about the state of PBS and communicates with
pbs_mom to get
information about the status of running jobs, memory available etc. It then
makes decisions as to what jobs to run.
pbs_sched must be executed with root permission.
OPTIONS¶
- -a alarm
- This specifies the time in seconds to wait for a schedule
run to finish. If a script takes too long to finish, an alarm signal is
sent, and the scheduler is restarted. If a core file does not exist in the
current directory, abort() is called and a core file is generated.
The default for alarm is 180 seconds.
- -b file
- This specifies the "body" file. The file given is
read into memory once at program start or after the program receives a
SIGHUP and executed each time the scheduler is awakened by the server. If
this option is not given, the file "sched_tcl" in the directory
PBS_HOME/sched_priv is read for the body code.
- -d home
- This specifies the PBS home directory, PBS_HOME. The
current working directory of the scheduler is PBS_HOME/sched_priv. If this
option is not given, PBS_HOME defaults to $PBS_SERVER_HOME as defined
during the PBS build procedure.
- -i file
- This specifies the "initialize" file. The file
given is executed once before the main processing loop is entered. If this
option is not given, no initialization code is executed.
- -L logfile
- Specifies an absolute path name of the file to use as the
log file. If not specified, the scheduler will open a file named for the
current date in the PBS_HOME/sched_logs directory (see the -d
option).
- -p file
- This specifies the "print" file. Any output from
the Tcl code which is written to standard out or standard error will be
written to this file. If this option is not given, the file used will be
PBS_HOME/sched_priv/sched_out. See the option.
- -S port
- This specifies the port to use. If this option is not
given, the default port for the PBS scheduler is used.
- -t file
- This specifies the "terminator" file. If a QUIT
command is sent from the server, this code is executed before the
scheduler exits. If this option is not given, no special termination
handling is done.
- -v
- This puts the scheduler into "verbose" mode. Any
errors will be shown no matter what this may be set to, but some
"uninteresting" events may be logged by using this flag. An
example is a message each time the server contacts the scheduler.
- -c file
- Specify a configuration file, see description below. If
this is a relative file name it will be relative to PBS_HOME/sched_priv,
see the -d option. If the -c option is not supplied, pbs_sched will not
attempt to open a configuration file.
The options that specify file names may be absolute or relative. If they are
relative, their root directory will be PBS_HOME/sched_priv.
USAGE¶
This version of the scheduler requires knowledge of the Tcl language. A set of
functions to communicate with the PBS server and resource monitor have been
added to those normally available with Tcl. All these calls will set the Tcl
variable "pbs_errno" to a value to indicate if an error occured. In
all cases, the value "0" means no error. If a call to a Resource
Monitor function is made, any error value will come from the system supplied
errno variable. If the function call communicates with the PBS Server,
any error value will come from the error number returned by the server.
- openrm host ?port?
- Creates a connection to the PBS Resource Monitor on
host using port as the port number or the standard port for
the resource monitor if it is not given. A connection handle is returned.
If the open is successful, this will be a non-negative integer. If not, an
error occurred.
- closerm connection
- The parameter connection is a handle to a resource
monitor which was previously returned from openrm. This connection
is closed. Nothing is returned.
- downrm connection
- Sends a command to the connected resource monitor to
shutdown. Nothing is returned.
- configrm connection filename
- Sends a command to the connected resource monitor to read
the configuration file given by filename. If this is successful, a
"0" is returned, otherwise, "-1" is returned.
- addreq connection request
- A resource request is sent to the connected resource
monitor. If this is successful, a "0" is returned, otherwise,
"-1" is returned.
- getreq connection
- One resource request response from the connected resource
monitor is returned. If an error occurred or there are no more responses,
an empty string is returned.
- allreq request
- A resource request is sent to all connected resource
monitors. The number of streams acted upon is returned.
- flushreq
- All resource requests previously sent to all connected
resource monitors are flushed out to the network. Nothing is
returned.
- activereq
- The connection number of the next stream with something to
read is returned. If there is nothing to read from any of the connections,
a negative number is returned.
- fullresp flag
- Evaluates flag as a boolean value and sets the
response mode used by getreq to full if flag
evaluates to "true". The full return from a resource monitor
includes the original request followed by an equal sign followed by the
response. The default situation is only to return the response following
the equal sign. If a script needs to "see" the entire line, this
function may be used.
- pbsstatserv
- The server is sent a status request for information about
the server itself. If the request succeeds, a list with three elements is
returned, otherwise an empty string is returned. The first element is the
server's name. The second is a list of attributes. The third is the
"text" associated with the server (usually blank).
- pbsstatjob
- The server is sent a status request for information about
the all jobs resident within the server. If the request succeeds, a list
is returned, otherwise an empty string is returned. The list contains an
entry for each job. Each element is a list with three elements. The first
is the job's jobid. The second is a list of attributes. The attribute
names which specify resources will have a name of the form
"Resource_List:name" where "name" is the resource
name. The third is the "text" associated with the job (usually
blank).
- pbsstatque
- The server is sent a status request for information about
all queues resident within the server. If the request succeeds, a list is
returned, otherwise an empty string is returned. The list contains an
entry for each queue. Each element is a list with three elements. This
first is the queue's name. The second is a list of attributes similar to
pbsstatjob. The third is the "text" associated with the
queue (usually blank).
- pbsstatnode
- The server is sent a status request for information about
all nodes defined within the server. If the request succeeds, a list is
returned, otherwise an empty string is returned. The list contains an
entry for each node. Each element is a list with three elements. This
first is the nodes's name. The second is a list of attributes similar to
pbsstatjob. The third is the "text" associated with the
node (usually blank).
- pbsselstat
- The server is sent a status request for information about
the all runnable jobs resident within the server. If the request succeeds,
a list similar to pbsstatjob is returned, otherwise an empty string
is returned.
- pbsrunjob jobid ?location?
- Run the job given by jobid at the location given by
location. If location is not given, the default location is
used. If this is successful, a "0" is returned, otherwise,
"-1" is returned.
- pbsasyrunjob jobid ?location?
- Run the job given by jobid at the location given by
location without waiting for a positive response that the job has
actually started. If location is not given, the default location is
used. If this is successful, a "0" is returned, otherwise,
"-1" is returned.
- pbsrerunjob jobid
- Re-runs the job given by jobid. If this is
successful, a "0" is returned, otherwise, "-1" is
returned.
- pbsdeljob jobid
- Delete the job given by jobid. If this is
successful, a "0" is returned, otherwise, "-1" is
returned.
- pbsholdjob jobid
- Place a hold on the job given by jobid. If this is
successful, a "0" is returned, otherwise, "-1" is
returned.
- pbsmovejob jobid ?location?
- Move the job given by jobid to the location given by
location. If location is not given, the default location is
used. If this is successful, a "0" is returned, otherwise,
"-1" is returned.
- pbsqenable queue
- Set the "enabled" attribute for the queue given
by queue to true. If this is successful, a "0" is
returned, otherwise, "-1" is returned.
- pbsqdisable queue
- Set the "enabled" attribute for the queue given
by queue to false. If this is successful, a "0" is
returned, otherwise, "-1" is returned.
- pbsqstart queue
- Set the "started" attribute for the queue given
by queue to true. If this is successful, a "0" is
returned, otherwise, "-1" is returned.
- pbsqstop queue
- Set the "started" attribute for the queue given
by queue to false. If this is successful, a "0" is
returned, otherwise, "-1" is returned.
- pbsalterjob jobid attribute_list
- Alter the attributes for a job specified by jobid.
The parameter attribute_list is the list of attributes to be
altered. There can be more than one. Each attribute consists of a list of
three elements. The first is the name, the second the resource and the
third is the new value. If the alter is successful, a "0" is
returned, otherwise, "-1" is returned.
- pbsrescquery resource_list
- Obtain information about the resources specified by
resource_list. This will be a list of strings. If the request
succeeds, a list with the same number of elements as resource_list
is returned. Each element in this list will be a list with four numbers.
The numbers specify available, allocated, reserved,
and down in that order.
- pbsrescreserve resource_id resource_list
- Make (or extend) a reservation for the resources specified
by resource_list which will be given as a list of strings. The
parameter resource_id is a number which provides a unique
identifier for a reservation being tracked by the server. If
resource_id is given as "0", a new reservation is
created. In this case, a new identifier is generated and returned by the
function. If an old identifier is used, that same number will be returned.
The Tcl variable "pbs_errno" will be set to indicate the success
or failure of the reservation.
- pbsrescrelease resource_id
- The reservation specified by resource_id is
released.
The two following commands are not normally used by the scheduler. They are
included here because there could be a need for a scheduler to contact a
server other than the one which it normally communicates with. Also, these
commands are used by the Tcl tools.
- pbsconnect ?server?
- Make a connection to the named server or the default server
if a parameter is not given. Only one connection to a server is allowed at
any one time.
- pbsdisconnect
- Disconnect from the currently connected server.
The above Tcl functions use PBS interface library calls for communication with
the server and the PBS resource monitor library to communicate with pbs_mom.
- datetime ?day? ?time?
- The number of arguments used determine the type of date to
be calculated. With no arguments, the current POSIX date is returned. This
is an integer in seconds.
With one argument there are two possible formats. The first is a 12 (or
more) character string specifying a complete date in the following format:
YYMMDDhhmmss
All characters must be digits. The year (YY) is given by the first two (or
more) characters and is the number of years since 1900. The month (MM) is
the number of the month [01-12]. The day (DD) is the day of the month
[01-32]. The hour (hh) is the hour of the day [00-23]. The minute (mm) is
minutes after the hour [00-59]. The second (ss) is seconds after the
minute [00-59]. The POSIX date for the given date/time is returned.
The second option with one argument is a relative time. The format for this
is
HH:MM:SS
With hours (HH), minutes (MM) and seconds (SS) being separated by colons
":". The number returned in this case will be the number of
seconds in the interval specified, not an absolute POSIX date.
With two arguments a relative date is calculated. The first argument
specifies a day of the week and must be one of the following strings:
"Sun", "Mon", "Tue", "Wed",
"Thr", "Fri", or "Sat". The second argument
is a relative time as given above. The POSIX date calculated will be the
day of the week given which follows the current day, and the time given in
the second argument. For example, if the current day was Monday, and the
two arguments were "Fri" and "04:30:00", the date
calculated would be the POSIX date for the Friday following the current
Monday, at four-thirty in the morning. If the day specified and the
current day are the same, the current day is used, not the day one week
later.
- strftime format time
- This function calls the POSIX function strftime().
It requires two arguments. The first is a format string. The format
conventions are the same as those for the POSIX function strftime(). The
second argument is POSIX calendar time in second as returned by
datetime. It returns a string based on the format given. This gives
the ability to extract information about a time, or format it for
printing.
The Tcl interpreter is started at program initialization and after a reset (the
receipt of a SIGHUP signal). It is not deleted between scheduling runs so
variables which are set in one can be accessed later.
The "initialize" and "terminator" files are run with no
supplied connection to the server. This means that none of the above functions
which talk to the server will work unless
pbsconnect is called first.
The "body" file is run with a connection to the server already
established.
CONFIGURATION FILE¶
A configuration file may be specified with the -c option. This file may be used
to specify the hosts (servers) which are allowed to connect to pbs_sched. The
hosts are specified in the configuration file in a manor identical to that
used in pbs_mom. There is one line per host with the syntax:
where clienthost and hostname are separated by white space.
Two host names are always allowed to connection to pbs_sched,
"localhost" and the name returned to pbs_sched by the system call
gethostname(). These names need not be specified in the configuration file.
The configuration file must be "secure". It must be owned by a user id
and group id less than 10 and not be world writable.
FILES¶
- $PBS_SERVER_HOME/sched_priv
- the default directory for configuration files, typically
(/usr/spool/pbs)/sched_priv.
Signal Handling¶
A C based scheduler will handle the following signals:
- SIGHUP
- The server will close and reopen its log file and reread
the config file if one exists.
- SIGALRM
- If the site supplied scheduling module exceeds the time
limit, the Alarm will cause the scheduler to attempt to core dump and
restart itself.
- SIGINT and SIGTERM
- Will result in an orderly shutdown of the scheduler.
All other signals have the default action installed.
EXIT STATUS¶
Upon normal termination, an exit status of zero is returned.
SEE ALSO¶
pbs_scheduler_cc(8B), pbs_scheduler_rule(8B), pbs_server(8B), and pbs_mom(8B).
PBS Internal Design Specification