'\" t .\" Title: gfpcopy .\" Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author] .\" Generator: DocBook XSL Stylesheets vsnapshot .\" Date: 27 Aug 2015 .\" Manual: Gfarm .\" Source: Gfarm .\" Language: English .\" .TH "GFPCOPY" "1" "27 Aug 2015" "Gfarm" "Gfarm" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" gfpcopy \- copy Gfarm files in parallel .SH "SYNOPSIS" .HP \w'\fBgfpcopy\fR\ 'u \fBgfpcopy\fR [\-nqvdpPU] [\-X\ \fIregexp\fR] [\-S\ \fIsource\-domainname\fR] [\-h\ \fIsource\-hostfile\fR] [\-D\ \fIdestination\-domainname\fR] [\-H\ \fIdestination\-hostfile\fR] [\-j\ \fInum\-of\-processes\fR] [\-J\ \fInum\-of\-processes\fR] [\-M\ \fIlimit\-byte\fR] [\-z\ \fIminimum\-byte\fR] [\-Z\ \fImaximum\-byte\fR] [\-w\ \fIway\-of\-scheduling\fR] [\-W\ \fIkilobytes\-for\-threshold\fR] [\-s\ \fIKB/s\-to\-simulate\fR] [\-F\ \fInum\-for\-readahead\fR] [\-b\ \fIbufsize\fR] [\-f] [\-e] [\-k] \fIsource\-path\fR \fIdestination\-path\fR .SH "DESCRIPTION" .PP \fBgfpcopy\fR copies files in parallel\&. .PP When the \fIsource\-path\fR is a directory, files under the directory will be copied recursively\&. .PP When the \fIdestination\-path\fR does not exist, the directory is created\&. When the \fIdestination\-path\fR exists, a directory of the same name as the \fIsource\-path\fR is created under the \fIdestination\-path\fR\&. .PP A set of source/destination hosts can be specified by a domain name and/or a hostlist file\&. When both a domain name and a hostlist file are specified, a set of hosts is determined by both conditions\&. When a set of source hosts is specified, only files stored on the source hosts are copied\&. .PP \fBgfpcopy\fR also retrieves the directory entries in parallel\&. .SH "SOURCE PATH" .PP \fIsource\-path\fR must be one of the following formats\&. Files on HPSS cannot be copied\&. .PP \fIpath\-name\fR .RS 4 is a relative path or an absolute path of a local file system\&. When the path is a mount point on gfarm2fs, files are copied without passing through the gfarm2fs\&. .RE .PP \fIgfarm:\&.\&.\&.\fR .RS 4 is a Gfarm URL\&. .RE .PP \fIfile:\&.\&.\&.\fR .RS 4 is an URL of a local file system\&. .RE .SH "DESTINATION PATH" .PP \fIdestination\-path\fR must be one of the following formats\&. .PP \fIpath\-name\fR .RS 4 is a relative path or an absolute path of a local file system\&. When the path is a mount point on gfarm2fs, files are copied without passing through the gfarm2fs\&. .RE .PP \fIgfarm:\&.\&.\&.\fR .RS 4 is a Gfarm URL of a directory\&. .RE .PP \fIfile:\&.\&.\&.\fR .RS 4 is an URL of a directory on a local file system\&. .RE .PP \fIhpss:\&.\&.\&.\fR .RS 4 is an URL of a directory on HPSS\&. If the same directory as the \fIsource\-path\fR exists under this directory, the \fIsource\-path\fR cannot be copied\&. The differential copy is not supported for HPSS\&. Relative path such as "hpss:"\&. and "hpss:dir" can be specified\&. .RE .SH "GFPCOPY OPTIONS" .PP These are options only for \fBgfpcopy\fR\&. .PP \fB\-b\fR \fIbufsize\fR .RS 4 Specifies the buffer size in bytes to copy\&. .sp The default value is 64 KiB (64 * 1024)\&. .RE .PP \fB\-f\fR .RS 4 With the \-f option, existing files will be overwritten when the size is different or the modification time (mtime) is different from the source file\&. .sp Without the \-f option, existing files will be overwritten when they are older than the corresponding source files in the modification time\&. .RE .PP \fB\-e\fR .RS 4 Skips existing files in order to execute gfpcopy simultaneously\&. .RE .PP \fB\-k\fR .RS 4 Does not copy symbolic links\&. .RE .SH "COMMON OPTIONS" .PP The following options are common options for \fBgfprep\fR and \fBgfpcopy\fR\&. .PP \fB\-X\fR \fIregexp\fR .RS 4 Skips files matched by the pattern of \fIregexp\fR\&. When multiple patterns need to be specified, specify \-X options multiple times\&. .RE .PP \fB\-S\fR \fIsource\-domainname\fR .RS 4 Creates file replicas or copies files only stored on the hosts in the specified domain name\&. .RE .PP \fB\-h\fR \fIsource\-hostfile\fR .RS 4 Creates file replicas or copies files only stored on the hosts listed in the specified hostfile\&. The \fIsource\-hostfile\fR consists of a file system node name on each line\&. .sp If ``\-\*(Aq\*(Aq is specified, standard input is used to read the host list\&. .RE .PP \fB\-L\fR .RS 4 Creates file replicas or copies files from the hosts specified by the \-S or \-h option\&. .RE .PP \fB\-D\fR \fIdestination\-domainname\fR .RS 4 Specifies the domain name for destination\&. .sp If neither this nor the \fB\-H\fR option is specified, replicas may be copied to any available host\&. .RE .PP \fB\-H\fR \fIdestination\-hostfile\fR .RS 4 Specifies a file which describes hostnames for destination\&. The \fIdestination\-hostfile\fR consists of a file system node name on each line\&. .sp If ``\-\*(Aq\*(Aq is specified, standard input is used to read the host list\&. .RE .PP \fB\-j\fR \fInum\-of\-processes\fR .RS 4 Specifies the maximum number of processes to create file replicas (or copy files) simultaneously\&. .sp The default value is the parameter of client_parallel_copy in gfarm2\&.conf\&. (see man gfarm2\&.conf) .sp The maximum number of process per file system node for source or destination is the number of CPUs (see man \fBgfhost\fR)\&. .RE .PP \fB\-J\fR \fInum\-of\-processes\fR .RS 4 Specifies the number of processes to retrieve directory entries in parallel\&. .sp The default value is 8\&. .RE .PP \fB\-M\fR \fItotal\-byte\fR .RS 4 Specifies the total file size in bytes to replicate or copy\&. This option is useful to increase the available capacity by moving the specified bytes of files\&. .sp The default value is unlimited\&. .RE .PP \fB\-z\fR \fIminimum\-byte\fR .RS 4 Specifies the minimum file size in bytes to replicate or copy\&. This option is useful not to replicate or copy small files\&. .sp The default value is unlimited\&. .RE .PP \fB\-Z\fR \fImaximum\-byte\fR .RS 4 Specifies the maximum file size in bytes to replicate or copy\&. This option is useful not to replicate or copy large files\&. .sp The default value is unlimited\&. .RE .PP \fB\-w\fR \fIway\-of\-scheduling\fR .RS 4 Specifies a scheduling method\&. ``noplan\*(Aq\*(Aq replicates/copies while finding files\&. ``greedy\*(Aq\*(Aq schedules greedily the order of replication/copy beforehand\&. .sp The default behavior is ``noplan\*(Aq\*(Aq\&. .sp ``greedy\*(Aq\*(Aq scheduling cannot use with the \-N option and \-m option\&. .RE .PP \fB\-W\fR \fIkibibytes\fR .RS 4 Specifies a threshold size/cost(KiB) to flat costs of Connections\&. A Connection means a scheduling information to assign files per a child\-process .sp This option is effective with \-w greedy\&. .sp The default value is 50*1024 KiB (50 MiB)\&. .RE .PP \fB\-I\fR \fIsec\-to\-update\fR .RS 4 Specifies the interval in seconds to collect load average and available capacity\&. .sp Default is 300 seconds\&. .RE .PP \fB\-B\fR .RS 4 Gfarm 2\&.6\&.16 or later does not select high loaded file system nodes\&. This option disables this feature\&. .sp High loaded node is defined by having more CPU load than schedule_busy_load_thresh * number of CPUs\&. For details of schedule_busy_load_thresh, refer to a manual page of gfarm2\&.conf\&. .RE .PP \fB\-U\fR .RS 4 Disables checking the available disk space of the selected node every time\&. .RE .PP \fB\-F\fR \fInum\-of\-dirents\fR .RS 4 Specifies the number of readahead entries to retrieve the directory entries\&. .sp The default value is 10000\&. .RE .PP \fB\-s\fR \fIkilobytes\-per\-second\fR .RS 4 Specifies a throughput(KB/s) to simulate the replication/copy, and does nothing (gets file information only)\&. .RE .PP \fB\-n\fR .RS 4 Does nothing\&. .RE .PP \fB\-p\fR .RS 4 Reports the total performance information\&. .RE .PP \fB\-P\fR .RS 4 Reports the performance information for each file and all files\&. .RE .PP \fB\-q\fR .RS 4 Suppresses non\-error messages\&. .RE .PP \fB\-v\fR .RS 4 Displays verbose output\&. .RE .PP \fB\-d\fR .RS 4 Displays debug output\&. .RE .PP \fB\-?\fR .RS 4 Displays a list of command options\&. .RE .SH "EXAMPLES" .PP To copy files under the directory recursively\&. .sp .if n \{\ .RS 4 .\} .nf $ gfpcopy gfarm:///dir file:///tmp/dir .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf $ gfpcopy file:///tmp/dir gfarm:///dir .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf $ gfpcopy gfarm:///dir1 gfarm:///dir2 .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf $ gfpcopy gfarm:///dir hpss:///tmp/dir .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf $ cd /mnt/gfarm2fs $ gfpcopy dir /tmp/dir .fi .if n \{\ .RE .\} .PP To copy a file\&. .sp .if n \{\ .RS 4 .\} .nf $ gfpcopy gfarm:///dir/file file:///dir .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf $ cd /mnt/gfarm2fs $ gfpcopy file /tmp/dir .fi .if n \{\ .RE .\} .SH "NOTES" .PP To retrieve the directory entries efficiently, it is better to execute \fBgfpcopy\fR command near the metadata server\&. When you need to execute \fBgfpcopy\fR command far from the metadata server, increase the parallelism by the \-j and \-J options\&. .SH "SEE ALSO" .PP \fBgfprep\fR(1), \fBgfreg\fR(1), \fBgfexport\fR(1), \fBgfarm2.conf\fR(5)