CVS-FAST-EXPORT(1) | CVS-FAST-EXPORT(1) |
NAME¶
cvs-fast-export - fast-export history from a CVS repository or RCS collection.
SYNOPSIS¶
cvs-fast-export [-h] [-a] [-w fuzz] [-g] [-l] [-v] [-q] [-V] [-T] [-p] [-P] [-i date] [-k expansion] [-A authormap] [-t threads] [-R revmap] [--reposurgeon] [-e remote] [-s stripprefix]
DESCRIPTION¶
cvs-fast-export tries to group the per-file commits and tags in a RCS file collection or CVS project repository into per-project changeset commits with common metadata. It emits a Git fast-import stream describing these changesets to standard output.
This tool is best used in conjunction with reposurgeon(1). Plain cvs-fast-export conversions contain various sorts of fossils that reposurgeon is good for cleaning up. See the Repository Editing and Conversion With Reposurgeon to learn about the sanity-checking and polishing steps required for a really high-quality conversion, including reference lifting and various sorts of artifact cleanup.
If arguments are supplied, the program assumes all ending with the extension ",v" are master files and reads them in. If no arguments are supplied, the program reads filenames from stdin, one per line. Directories and files not ending in ",v" are skipped. (But see the description of the -P option for how to change this behavior.)
Files from either Unix CVS or CVS-NT are handled. If a collection of files has commitid fields, changesets will be constructed reliably using those.
In the default mode, which generates a git-style fast-export stream to standard output:
See the later section on RCS/CVS LIMITATIONS for more information on edge cases and conversion problems.
This program does not depend on any of the CVS metadata held outside the individual content files (e.g. under CVSROOT).
The variable TMPDIR is honored and used when generating a temporary directory in which to store file content during processing.
This program treats the file contents of the source CVS or RCS repository, and their filenames. as uninterpreted byte sequences to be passed through to the git conversion without re-encoding. In particular, it makes no attempt to fix up line endings (Unix \n vs, Windows \r\n vs. Macintosh \r), nor does it know about what repository filenames might collide with special filenames on any given platform. CVS $-keywords in the masters are not interpreted pr expanded; this prevents corruption of binary content.
This program treats change comments as uninterpreted byte sequences to be passed through to the git conversion without change or re-encoding. If you need to re-encode (e.g, from Latin-1 to UTF-8) or remap CVS version IDs to something useful, use cvs-fast-export in conjunction with the transcode and references lift commands of reposurgeon(1).
OPTIONS¶
-h
-w fuzz
-c
-g
-l
-a
-A authormap
ferd = Ferd J. Foonly <foonly@foo.com> America/Chicago
and will be applied to map the Unix username ferd to the DVCS-style user identity specified after the equals sign. The timezone field (after > and whitespace) is optional and (if present) is used to set the timezone offset to be attached to the date; acceptable formats for the timezone field are anything that can be in the TZ environment variable, including a [+-]hhmm offset. Whitespace around the equals sign is stripped. Lines beginning with a # or not containing an equals sign are silently ignored.
-R revmap
-v
-q
-T
--reposurgeon
--embed-id
-V
-e remote
-s stripprefix
-t threadcount
-p
-P
-i date
EXAMPLE¶
A very typical invocation would look like this:
find . | cvs-fast-export >stream.fi
Your cvs-fast-export distribution should also supply cvssync(1), a tool for fetching CVS masters from a remote repository. Using them together will look something like this:
cvssync anonymous@cvs.savannah.gnu.org:/sources/groff groff find groff | cvs-fast-export >groff.fi
Progress reporting can be reassuring if you expect a conversion to run for some time. It will animate completion percentages as the conversion proceeds and display timings when done.
The cvs-fast-export suite contains a wrapper script called cvsconvert that is useful for running a conversion and automatically checking its content against the CVS original.
RCS/CVS LIMITATIONS¶
Translating RCS/CVS repositories to the generic DVCS model expressed by import streams is not merely difficult and messy, there are weird RCS/CVS cases that cannot be correctly translated at all. cvs-fast-export will try to warn you about these cases rather than silently producing broken or incomplete translations, but there be dragons. We recommend some precautions under SANITY CHECKING.
Timestamps from CVS histories are not very reliable - CVS made them on the client side rather than at the server; this makes them subject to local clock skew, timezone, and DST issues.
CVS-NT and versions of GNU CVS after 1.12 (2004) added a changeset commit-id to file metadata. Older sections of CVS history without these are vulnerable to various problems caused by clock skew between clients; this used to be relatively common for multiple reasons, including less pervasive use of NTP clock synchronization. cvs-fast-export will warn you ("commits before this date lack commitids") when it sees such a section in your history. When it does, these caveats apply:
One more property affected by commitids is the stability of old changesets under incremental dumping. Under a CVS implementation issuing commitids, new CVS commits are guaranteed not to change cvs-fast-export’s changeset derivation from a previous history; thus, updating a target DVCS repository with incremental dumps from a live CVS installation will work. Even if older portions of the history do not have commitids, conversions will be stable. This stability guarantee is lost if you are using a version of CVS that does not issue commitids.
Also note that a CVS repository has to be completely reanalyzed even for incremental dumps; thus, processing time and memory requirements will rise with the total repository size even when the requested reporting interval of the incremental dump is small.
These problems cannot be fixed in cvs-fast-export; they are inherent to CVS.
REQUIREMENTS AND PERFORMANCE¶
Because the code is designed for dealing with large data sets, it has been optimized for 64-bit machines and no particular effort has been made to keep it 32-bit clean. Various counters may overflow if you try using it to lift a large repository on a 32-bit machine.
cvs-fast-export is designed to do translation with all its intermediate structures in memory, in one pass. This contrasts with cvs2git(1), which uses multiple passes and journals intermediate structures to disk. The tradeoffs are that cvs-fast-export is much faster than cvs2git (by a ratio of over 100:1 on real repositories), but will fail with an out-of-memory error on CVS repositories large enough that the metadata storage (not the content blobs, just the attributions and comments) overflow your physical memory. In practice, you are unlikely to push this limit on a machine with 32GB of RAM and effectively certain not to with 64GB. Attempts to do large conversions in only a 32-bit (4GB) address space are, on the other hand, unlikely to end well.
The program’s transient RAM requirements can be quite a bit larger; it must slurp in each entire master file once in order to do delta assembly and generate the version snapshots that will become snapshots. Using the -t option multiplies the expected amount of transient storage required by the number of threads; use with care, as it is easy to push memory usage so high that swap overhead overwhelms the gains from not constantly blocking on I/O.
The program also requires temporary disk space equivalent to the sum of the sizes of all revisions in all files.
On stock PC hardware in 2020, cvs-fast-export achieves processing speeds upwards of 64K CVS commits per minute on real repositories. Time performance is primarily I/O bound and can be improved by running on an SSD rather than spinning rust.
LIMITATIONS¶
Branches occurring in only a subset of the analyzed masters are not correctly resolved; instead, an entirely disjoint history will be created containing the branch revisions and all parents back to the root.
The program does try to do something useful cases in which a tag occurs in a set of revisions that does not correspond to any gitspace commit. In this case a tagged branch containing only one commit is created, guaranteeing that you can check out a set of files containing the CVS content for the tag. The commit comment is "Synthetic commit for incomplete tag XXX", where XXX is the relevant tag. The root of the branchlet is the gitspace commit where the latest CVS revision in in the tagged set first occurs; this is the commit the tag would point at if its incompleteness were ignored. The change in the branchlet commit is also applied forward in the nearby mainline.
This program does the equivalent of cvs -kb when checking out masters, not performing any $-keyword expansion at all. This has the advantage that binary files can never be clobbered, no matter when k option was set on the master. It has the disadvantage that the data in $-headers is not reliable; at best you’ll get the unexpanded version of the $-cookie, at worst you might get the committer/timestamp information for when the master was originally checked in, rather than when it was last checked out. It’s good practice to remove all dollar cookies as part of post-conversion cleanup.
CVS vendor branches are a source of trouble. Sufficiently strange combinations of imports and local modifications will translate badly, producing incorrect content on master and elsewhere.
Some other CVS exporters try, or have tried, to deduce changesets from shared tags even when comment metadata doesn’t match perfectly. This one does not; the designers judge that to trip over too many pathological CVS tagging cases.
When running multithreaded, there is an edge case in which the program’s behavior is nondeterministic. If the same tag looks like it should be assigned to two different gitspace commits with the same timestamp, which tag it actually lands on will be random.
CVSNT is supported, but the CVSNT extension fieldss "hardlinks" and "username" are ignored.
Non-ASCII characters in user IDs are not supported.
SANITY CHECKING¶
After conversion, it is good practice to do the following verification steps:
A more comprehensive sanity check is described in Repository Editing and Conversion With Reposurgeon; browse it for more.
RETURN VALUE¶
0 if all files were found and successfully converted, 1 otherwise.
ERROR MESSAGES¶
Most of the messages cvs-fast-export emits are self-explanatory. Here are a few that aren’t. Where it says "check head", be sure to sanity-check against the head revision.
null branch name, probably from a damaged Attic file
fatal: internal error - duplicate key in red black tree
tag could not be assigned to a commit
discarding dead untagged branch
warning - unnamed branch
warning - no master branch generated
warning - xxx newer than yyy
warning - skew_vulnerable in file xxx rev yyy set to zzz
tip commit older than imputed branch join
some parent commits are younger than children
warning - branch point later than branch
more than one delta with number X.Y.Z
{revision|patch} with odd depth
duplicate tag in CVS master, ignoring
tag or branch name was empty after sanitization
revision number too long, increase CVS_MAX_DEPTH
snapshot sequence number too large, widen serial_t
too many branches, widen branchcount_t
corrupt delta in
edit script tried to delete beyond eof
internal error - branch cycle
internal error - lost tag
internal error - child commit emitted before parent exists
REPORTING BUGS¶
Report bugs to Eric S. Raymond <esr@thyrsus.com>. Please read "Reporting bugs in cvs-fast-export" before shipping a report. The project page itself is at http://catb.org/~esr/cvs-fast-export
SEE ALSO¶
rcs(1), cvs(1), cvssync(1), cvsconvert(1), reposurgeon(1), cvs2git(1).
2020-05-24 |