NAME¶
pdcp - copy files to groups of hosts in parallel
rpdcp - (reverse pdcp) copy files from a group of hosts in parallel
SYNOPSIS¶
pdcp [
options]... src [src2...] dest
rpdcp [
options]... src [src2...] dir
DESCRIPTION¶
pdcp is a variant of the
rcp(1) command. Unlike
rcp(1), which copies
files to a single remote host,
pdcp can copy files to multiple remote
hosts in parallel. However,
pdcp does not recognize files in the format
``rname@rhost:path,'' therefore all source files must be on the local host
machine. Destination nodes must be listed on the
pdcp command line
using a suitable target nodelist option (See the
OPTIONS section
below). Each destination node listed must have
pdcp installed for the
copy to succeed.
When
pdcp receives SIGINT (ctrl-C), it lists the status of current
threads. A second SIGINT within one second terminates the program. Pending
threads may be canceled by issuing ctrl-Z within one second of ctrl-C. Pending
threads are those that have not yet been initiated, or are still in the
process of connecting to the remote host.
Like
pdsh(1), the functionality of
pdcp may be supplemented by
dynamically loadable modules. In
pdcp, the modules may provide a new
connect protocol (replacing the standard
rsh(1) protocol), filtering options
(e.g. excluding hosts that are down), and/or host selection options (e.g.
-a selects all nodes from a local config file). By default,
pdcp
requires at least one "rcmd" module to be loaded (to provide the
channel for remote copy).
REVERSE PDCP¶
rpdcp performs a reverse parallel copy. Rather than copying files to
remote hosts, files are retrieved from remote hosts and stored locally. All
directories or files retrieved will be stored with their remote hostname
appended to the filename. The destination file must be a directory when this
option is used.
In other respects,
rpdcp is exactly like
pdcp, and further
statements regarding
pdcp in this manual also apply to
rpdcp.
RCMD MODULES¶
The method by which
pdcp connects to remote hosts may be selected at
runtime using the
-R option (See
OPTIONS below). This
functionality is ultimately implemented via dynamically loadable modules, and
so the list of available options may be different from installation to
installation. A list of currently available rcmd modules is printed when using
any of the
-h,
-V, or
-L options. The default rcmd module
will also be displayed with the
-h and
-V options.
A list of
rcmd modules currently distributed with
pdcp follows.
- rsh
- Uses an internal, thread-safe implementation of BSD rcmd(3)
to run commands using the standard rsh(1) protocol.
- ssh
- Uses a variant of popen(3) to run multiple copies of the
ssh(1) command.
- mrsh
- This module uses the mrsh(1) protocol to execute jobs on
remote hosts. The mrsh protocol uses a credential based authentication,
forgoing the need to allocate reserved ports. In other aspects, it acts
just like rsh.
- krb4
- The krb4 module allows users to execute remote commands
after authenticating with kerberos. Of course, the remote rshd daemons
must be kerberized.
- xcpu
- The xcpu module uses the xcpu service to execute remote
commands.
OPTIONS¶
The list of available
pdcp options is determined at runtime by
supplementing the list of standard
pdcp options with any options
provided by loaded
rcmd and
misc modules. In some cases, options
provided by modules may conflict with each other. In these cases, the modules
are incompatible and the first module loaded wins.
Standard target nodelist options¶
- -w TARGETS,...
- Target and or filter the specified list of hosts. Do not
use with any other node selection options (e.g. -a, -g, if
they are available). No spaces are allowed in the comma-separated list.
Arguments in the TARGETS list may include normal host names, a
range of hosts in hostlist format (See HOSTLIST EXPRESSIONS), or a
single `-' character to read the list of hosts on stdin.
If a host or hostlist is preceded by a `-' character, this causes those
hosts to be explicitly excluded. If the argument is preceded by a single
`^' character, it is taken to be the path to file containing a list of
hosts, one per line. If the item begins with a `/' character, it is taken
as a regular expression on which to filter the list of hosts (a regex
argument may also be optionally trailed by another '/', e.g. /node.*/). A
regex or file name argument may also be preceeded by a minus `-' to
exclude instead of include thoses hosts.
A list of hosts may also be preceded by "user@" to specify a
remote username other than the default, or "rcmd_type:" to
specify an alternate rcmd connection type for these hosts. When used
together, the rcmd type must be specified first, e.g.
"ssh:user1@host0" would use ssh to connect to host0 as user
"user1."
- -x host,host,...
- Exclude the specified hosts. May be specified in
conjunction with other target node list options such as -a and
-g (when available). Hostlists may also be specified to the
-x option (see the HOSTLIST EXPRESSIONS section below).
Arguments to -x may also be preceeded by the filename (`^') and
regex ('/') characters as described above, in which case the resulting
hosts are excluded as if they had been given to -w and preceeded
with the minus `-' character.
Standard pdcp options¶
- -h
- Output usage menu and quit. A list of available rcmd
modules will be printed at the end of the usage message.
- -q
- List option values and the target nodelist and exit without
action.
- -b
- Disable ctrl-C status feature so that a single ctrl-C kills
parallel copy. (Batch Mode)
- -r
- Copy directories recursively.
- -p
- Preserve modification time and modes.
- -e PATH
- Explicitly specify path to remote pdcp binary
instead of using the locally executed path. Can also be set via the
environment variable PDSH_REMOTE_PDCP_PATH.
- -l user
- This option may be used to copy files as another user,
subject to authorization. For BSD rcmd, this means the invoking user and
system must be listed in the user´s .rhosts file (even for
root).
- -t seconds
- Set the connect timeout. Default is 10 seconds.
- -f number
- Set the maximum number of simultaneous remote copies to
number. The default is 32.
- -R name
- Set rcmd module to name. This option may also be set
via the PDSH_RCMD_TYPE environment variable. A list of available rcmd
modules may be obtained via either the -h or -L
options.
- -M name,...
- When multiple misc modules provide the same options
to pdsh, the first module initialized "wins" and
subsequent modules are not loaded. The -M option allows a list of
modules to be specified that will be force-initialized before all others,
in-effect ensuring that they load without conflict (unless they conflict
with eachother). This option may also be set via the PDSH_MISC_MODULES
environment variable.
- -L
- List info on all loaded pdcp modules and quit.
- -d
- Include more complete thread status when SIGINT is
received, and display connect and command time statistics on stderr when
done.
- -V
- Output pdcp version information, along with list of
currently loaded modules, and exit.
HOSTLIST EXPRESSIONS¶
As noted in sections above,
pdcp accepts ranges of hostnames in the
general form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an
alternative to explicit lists of hosts. This form should not be confused with
regular expression character classes (also denoted by ``[]''). For example,
foo[19] does not represent foo1 or foo9, but rather represents a degenerate
range: foo19.
This range syntax is meant only as a convenience on clusters with a prefixNN
naming convention and specification of ranges should not be considered
necessary -- the list foo1,foo9 could be specified as such, or by the range
foo[1,9].
Some examples of range usage follow:
Copy /etc/hosts to foo01,foo02,...,foo05
pdcp -w foo[01-05] /etc/hosts /etc
Copy /etc/hosts to foo7,foo9,foo10
pdcp -w foo[7,9-10] /etc/hosts /etc
Copy /etc/hosts to foo0,foo4,foo5
pdcp -w foo[0-5] -x foo[1-3] /etc/hosts /etc
As a reminder to the reader, some shells will interpret brackets ('[' and ']')
for pattern matching. Depending on your shell, it may be necessary to enclose
ranged lists within quotes. For example, in tcsh, the first example above
should be executed as:
pdcp -w "foo[01-05]" /etc/hosts /etc
ORIGIN¶
Pdsh/pdcp was originally a rewrite of IBM
dsh(1) by Jim Garlick
<garlick@llnl.gov> on LLNL's ASCI Blue-Pacific IBM SP system. It is now
also used on Linux clusters at LLNL.
LIMITATIONS¶
When using
ssh for remote execution, stderr of ssh to be folded in with
that of the remote command. When invoked by pdcp, it is not possible for ssh
to prompt for confirmation if a host key changes, prompt for passwords if RSA
keys are not configured properly, etc.. Finally, the connect timeout is only
adjustable with ssh when the underlying ssh implementation supports it, and
pdsh has been built to use the correct option.
SEE ALSO¶
pdsh(1)