NAME¶
cvs-fast-export - fast-export history from a CVS repository or RCS
collection.
SYNOPSIS¶
cvs-fast-export [-h] [-a] [-w fuzz] [-g] [-l] [-v]
[-q] [-V] [-T] [-p] [-P] [-i date] [-k expansion] [-A
authormap] [-t threads] [-R revmap] [--reposurgeon] [-e
remote] [-s stripprefix]
DESCRIPTION¶
cvs-fast-export tries to group the per-file commits and tags in a
RCS file collection or CVS project repository into per-project changeset
commits with common metadata, in the style of Subversion and later
version-control systems.
This tool is best used in conjunction with reposurgeon(1). Plain
cvs-fast-export conversions contain various sorts of fossils that
reposurgeon is good for cleaning up. See the Repository Editing and
Conversion With Reposurgeon to learn about the sanity-checking and polishing
steps required for a really high-quality conversion, including reference
lifting and various sorts of artifact cleanup.
If arguments are supplied, the program assumes all ending with the
extension ",v" are master files and reads them in. If no arguments
are supplied, the program reads filenames from stdin, one per line.
Directories and files not ending in ",v" are skipped. (But see the
description of the -P option for how to change this behavior.)
Files from either Unix CVS or CVS-NT are handled. If a collection
of files has commitid fields, changesets will be constructed reliably using
those.
In the default mode, which generates a git-style fast-export
stream to standard output:
•The prefix given using the -s option or, if the
option is omitted, the longest common prefix of the paths is discarded from
each path.
•Files in CVS Attic and RCS directories are
treated as though the "Attic/" or "RCS/" portion of the
path were absent. This usually restores the history of files that were
deleted.
•Permissions on all fileops related to a
particular file will be controlled by the permissions on the corresponding
master. If the executable bit on the master is on, all its fileops will have
100755 permissions; otherwise 100644.
•A set of file operations is coalesced into a
changeset if either (a) they all share the same commitid, or (b) all have no
commitid but identical change comments, authors, and modification dates within
the window defined by the time-fuzz parameter. Unlike some other exporters, no
attempt is made to derive changesets from shared tags.
•Commits are issued in time order unless the
cvs-fast-export detects that some parent is younger than its child (this is
unlikely but possible in cases of severe clock skew). In that case you will
see a warning on standard error and the emission order is guaranteed
topologically correct, but otherwise not specified (and is subject to change
in future versions of this program).
•CVS tags become git lightweight tags when they
can be unambiguously associated with a changeset. If the same tag is attached
to file deltas that resolve to multiple changesets, it is reported as if
attached to the last of them.
•The HEAD branch is renamed to
master.
•Other tag and branch names are sanitized to be
legal for git; the characters ~^\*? are removed.
•Since .cvsignore files have a syntax
upward-compatible with that of .gitignore files, they’re renamed. In
order to simulate the default ignore behavior of CVS, those defaults are
prepended to root .cvsignore blobs renamed to .gitignore, and a root
.gitignore containing the defaults is generated if no such blobs exist.
See the later section on RCS/CVS LIMITATIONS for more information
on edge cases and conversion problems.
This program does not depend on any of the CVS metadata held
outside the individual content files (e.g. under CVSROOT).
The variable TMPDIR is honored and used when generating a
temporary directory in which to store file content during processing.
This program treats the file contents of the source CVS or RCS
repository, and their filenames. as uninterpreted byte sequences to be
passed through to the git conversion without re-encoding. In particular, it
makes no attempt to fix up line endings (Unix \n vs, Windows \r\n vs.
Macintosh \r), nor does it know about what repository filenames might
collide with special filenames on any given platform. Optionally it may
expand CVS $-keywords, but this is not recommended.
This program treats change comments as uninterpreted byte
sequences to be passed through to the git conversion without change or
re-encoding. If you need to re-encode (e.g, from Latin-1 to UTF-8) or remap
CVS version IDs to something useful, use cvs-fast-export in conjunction with
the transcode and references lift commands of
reposurgeon(1).
OPTIONS¶
-h
Display usage summary.
-w fuzz
Set the timestamp fuzz factor for identifying patch sets
in seconds. The default is 300 seconds. This option is irrelevant for
changesets with commitids.
-c
Don’t trust commit-IDs; match by ordinary
metatada. Will be useful if you have something like a CVS-NT repository in
which per-file commits were made in such a way that the cliques don’t
have matching IDs.
-g
generate a picture of the commit graph in the DOT markup
language used by the graphviz tools, rather than fast-exporting.
-l
Warnings normally go to standard error. This option,
which takes a filename, allows you to redirect them to a file. Convenient with
the -p option.
-a
Dump a list of author IDs found in the repository, rather
than fast-exporting.
-A authormap
Apply an author-map file to the attribution lines. Each
line must be of the form
ferd = Ferd J. Foonly <foonly@foo.com> America/Chicago
and will be applied to map the Unix username ferd to the
DVCS-style user identity specified after the equals sign. The timezone field
(after > and whitespace) is optional and (if present) is used to set the
timezone offset to be attached to the date; acceptable formats for the
timezone field are anything that can be in the TZ environment variable,
including a [+-]hhmm offset. Whitespace around the equals sign is stripped.
Lines beginning with a # or not containing an equals sign are silently
ignored.
-R revmap
Write a revision map to the specified argument filename.
Each line of the revision map consists of three whitespace-separated fields: a
filename, an RCS revision number, and the mark of the commit to which that
filename-revision pair was assigned. Doesn’t work with -g.
-v
Show verbose progress messages mainly of interest to
developers.
-q
Run quietly, suppressing warning messages about absence
of commitids and other minor problems for which the program can usually
compensate but which may indicate conversion problems. Meant to be used with
cvsconvert, which does its own correctness checking.
-T
Force deterministic dates for regression testing. Each
patchset will have a monotonic-increasing attributed date computed from its
mark in the output stream - the mark value times the commit time window times
two.
--reposurgeon
Emit for each commit a list of the CVS file:revision
pairs composing it as a bzr-style commit property named
"cvs-revisions". From version 2.12 onward,
reposurgeon(1) can
interpret these and use them as hints for reference-lifting. Also, suppresses
emission of "done" trailer.
--embed-id
Append to each commit comment identification of the CVS
commits that contributed to it.
-V
Emit the program version and exit.
-e remote
Exported branch names are prefixed with
refs/remotes/remote instead of refs/heads, making the import appear to
come from the named remote.
-s stripprefix
Strip the given prefix instead of longest common
prefix
-t threadcount
Running multithreaded increases the program’s
memory footprint proportionally to the number of threads, but means the
conversion may run in less total time because an I/O operation involving one
master file will not block compute-intensive processing of others. By default,
the program conservatively assumes it can use two threads per processor
available. You can use this option to set the number of threads; the value 0
forces sequential processing with no threading.
-p
Enable progress reporting. This also dumps statistics
(elapsed time and size of maximum resident set) for several points in the
conversion run.
-P
Normally cvs-fast-export will skip any filename presented
as an argument or on stdin that does not end with the RCS/CVS extension
",v", and will also ignore a pathname containing the string CVSROOT
(this avoids annoyances when running from or above a top-level CVS directory).
A strict reading of RCS allows masters without the ,v extension. This option
sets promiscuous mode, disabling both checks.
-i date
Enable incremental-dump mode. Only commits with a date
after that specified by the argument are emitted. Disables inclusion of
default ignores. Each branch root in the incremental dump is decorated with
git-stream magic which, when interpreted in context of a live repository, will
connect that branch to any branch of the same name. The date is expected to be
RFC3339 conformant (e.g. yy-mm-ddThh:mm:ssZ) or else an integer Unix time in
seconds.
EXAMPLE¶
A very typical invocation would look like this:
find . | cvs-fast-export >stream.fi
Your cvs-fast-export distribution should also supply cvssync(1), a
tool for fetching CVS masters from a remote repository. Using them together
will look something like this:
cvssync anonymous@cvs.savannah.gnu.org:/sources/groff groff
find groff | cvs-fast-export >groff.fi
Progress reporting can be reassuring if you expect a conversion to
run for some time. It will animate completion percentages as the conversion
proceeds and display timings when done.
The cvs-fast-export suite contains a wrapper script called
cvsconvert that is useful for running a conversion and automatically
checking its content against the CVS original.
RCS/CVS LIMITATIONS¶
Translating RCS/CVS repositories to the generic DVCS model
expressed by import streams is not merely difficult and messy, there are
weird RCS/CVS cases that cannot be correctly translated at all.
cvs-fast-export will try to warn you about these cases rather than silently
producing broken or incomplete translations, but there be dragons. We
recommend some precautions under SANITY CHECKING.
Timestamps from CVS histories are not very reliable - CVS made
them on the client side rather than at the server; this makes them subject
to local clock skew, timezone, and DST issues.
Time-skew effects combined with the way $-headers have to be
expanded to be compatible with CVS’s also imoly that if you have
$-headers in your code they may not be expanded identically in a tag
checkout from git to the way they were in the corresponding CVS
revisions.
CVS-NT and versions of GNU CVS after 1.12 (2004) added a changeset
commit-id to file metadata. Older sections of CVS history without these are
vulnerable to various problems caused by clock skew between clients; this
used to be relatively common for multiple reasons, including less pervasive
use of NTP clock synchronization. cvs-fast-export will warn you
("commits before this date lack commitids") when it sees such a
section in your history. When it does, these caveats apply:
•If timestamps of commits in the CVS repository
were not stable enough to be used for ordering commits, changes may be
reported in the wrong order.
•If the timestamp order of different files crosses
the revision order within the commit-matching time window, the order of
commits reported may be wrong.
One more property affected by commitids is the stability of old
changesets under incremental dumping. Under a CVS implementation issuing
commitids, new CVS commits are guaranteed not to change
cvs-fast-export’s changeset derivation from a previous history; thus,
updating a target DVCS repository with incremental dumps from a live CVS
installation will work. Even if older portions of the history do not have
commitids, conversions will be stable. This stability guarantee is lost if
you are using a version of CVS that does not issue commitids.
Also note that a CVS repository has to be completely reanalyzed
even for incremental dumps; thus, processing time and memory requirements
will rise with the total repository size even when the requested reporting
interval of the incremental dump is small.
These problems cannot be fixed in cvs-fast-export; they are
inherent to CVS.
CVS-FAST-EXPORT REQUIREMENTS AND LIMITATIONS¶
Because the code is designed for dealing with large data sets, it
has been optimized for 64-bit machines and no particular effort has been
made to keep it 32-bit clean. Various counters may overflow if you try using
it to lift a large repository on a 32-bit machine.
Branches occurring in only a subset of the analyzed masters are
not correctly resolved; instead, an entirely disjoint history will be
created containing the branch revisions and all parents back to the
root.
CVS vendor branches are a source of trouble. Sufficiently strange
combinations of imports and local modifications will translate badly,
producing incorrect content on master and elsewhere.
Some other CVS exporters try, or have tried, to deduce changesets
from shared tags even when comment metadata doesn’t match perfectly.
This one does not; the designers judge that to trip over too many
pathological CVS tagging cases.
The program does try to do something useful cases in which a tag
occurs in a set of revisions that does not correspond to any gitspace
commit. In this case a tagged branch containing only one commit is created,
guaranteeing that you can check out a set of files containing the CVS
content for the tag. The commit comment is "Synthetic commit for
incomplete tag XXX", where XXX is the relevant tag. The root of the
branchlet is the gitspace commit where the latest CVS revision in in the
tagged set first occurs; this is the commit the tag would point at if its
incompleteness were ignored. The change in the branchlet commit is
also applied forward in the nearby mainline.
When running multithreaded, there is an edge case in which the
program’s behavior is nondeterministic. If the same tag looks like it
should be assigned to two different gitspace commits with the same
timestamp, which tag it actually lands on will be random.
cvs-fast-export is designed to do translation with all its
intermediate structures in memory, in one pass. This contrasts with
cvs2git(1), which uses multiple passes and journals intermediate structures
to disk. The tradeoffs are that cvs-fast-export is much faster than cvs2git
(by a ratio of over 100:1 on real repositories), but will fail with an
out-of-memory error on CVS repositories large enough to overflow your
physical memory. In practice, you are unlikely to push this limit on a
machine with 32GB of RAM and effectively certain not to with 64GB. Attempts
to do large conversions in only a 32-bit (4GB) address space are, on the
other hand, unlikely to end well.
The program’s transient RAM requirements can be quite a bit
larger; it must slurp in each entire master file once in order to do delta
assembly and generate the version snapshots that will become snapshots.
Using the -t option multiplies the expected amount of transient storage
required by the number of threads; use with care, as it is easy to push
memory usage so high that swap overhead overwhelms the gains from not
constantly blocking on I/O.
The program also requires temporary disk space equivalent to the
sum of the sizes of all revisions in all files.
On stock PC hardware in 2020, cvs-fast-export achieves processing
speeds upwards of 64K CVS commits per minute on real repositories. Time
performance is primarily I/O bound and can be improved by running on an
SSD.
SANITY CHECKING¶
After conversion, it is good practice to do the following
verification steps:
1.If you ran the conversion directly with
cvs-fast-export rather than using cvsconvert, use
diff(1) with the -r option
to compare a CVS head checkout with a checkout of the converted repository.
The only differences you should see are those due to RCS keyword expansion,
.cvsignore lifting, and manifest mismatches due to CVS not tracking file
deaths quite correctly. If this is not true, you may have found a bug in
cvs-fast-export; please report it with a copy of the CVS repo.
2.Examine the translated repository with
reposurgeon(1)
looking (in particular) for misplaced tags or branch joins. Often these can be
manually repaired with little effort. These flaws do
not necessarily
imply bugs in cvs-fast-export; they may simply indicate previously undetected
malformations in the CVS history. However, reporting them may help improve
cvs-fast-export.
A more comprehensive sanity check is described in Repository
Editing and Conversion With Reposurgeon; browse it for more.
RETURN VALUE¶
0 if all files were found and successfully converted, 1
otherwise.
ERROR MESSAGES¶
Most of the messages cvs-fast-export emits are self-explanatory.
Here are a few that aren’t. Where it says "check head", be
sure to sanity-check against the head revision.
null branch name, probably from a damaged Attic file
The code was unable to deduce a name for a branch and
tried to export a null pointer as a name. The branch is given the name
"null". It is likely this history will need repair.
fatal: internal error - duplicate key in red black tree
Multiple tags with identical names exist in one of your
master files. This is a sign of a corrupted revision history; you will need to
manually inspect the master and remove one of the duplicates.
child commit emitted before parent exists
Skew in client timestamps produced a situation in which
time order of parent and child commits is backwards. If you are running
multithreaded, this probably means two CVS commits with identical timestamps
were randomly processed in the wrong order; try forcing single-thread
operation wuth -t 0. If the error recurs with -t 0, this indicates a serious
metadata malformation or cvs-fast-export bug and should be reportedvto the
maintainers.
tag could not be assigned to a commit
RCS/CVS tags are per-file, not per revision. If
developers are not careful in their use of tagging, it can be impossible to
associate a tag with any of the changesets that cvs-fast-export resolves. When
this happens, cvs-fast-export will issue this warning and the tag named will
be discarded.
discarding dead untagged branch
Analysis found a CVS branch with no tag consisting
entirely of dead revisions. These cannot have been visible in the archival
state of the CVS at conversion time; it is possible they may have been visible
as branch content at some point in the repository’s past, but without
an identifying tag that state is impossible to reconstruct.
warning - unnamed branch
A CVS branch with a live revision lacks a head label. A
label with "-UNNAMED-BRANCH" suffixed to the name of the parent
branch will be generated.
warning - no master branch generated
cvs-fast-export could not identify the default (HEAD)
branch and therefore there is no "master" in the conversion; this
will seriously confuse git and probably other VCSes when they try to import
the output stream. You may be able to identify and rename a master branch
using
reposurgeon(1).
warning - xxx newer than yyy
Early in analysis of a CVS master file, time sort order
of its deltas doesn’t match the topological order defined by the
revision numbers. The most likely cause of this is clock skew between clients
in very old CVS versions. The program will attempt to correct for this by
tweaking the revision date of the out-of-order commit to be that of its
parent, but this may not prevent other time-skew errors later in
analysis.
warning - skew_vulnerable in file xxx rev yyy set to zzz
This warning is emitted when verbose is on and only on
commits with no commit ID. It calls out commits that coause the date before
which coalescence is unreliable to be set forward.
tip commit older than imputed branch join
A similar problem to "newer than" being
reported at a later stage, when file branches are being knit into changeset
branches. One CVS branch in a collection about to be collated into a gitspace
branch has a tip commit older than the earliest commit that is a a parent on
some (other) tip in the collection. The adventious branch is snipped
off.
some parent commits are younger than children
May indicate that cvs-fast-export aggregated some
changesets in the wrong order; probably harmless, but check head.
warning - branch point later than branch
Late in the analysis, when connecting branches to their
parents in the changeset DAG, the commit date of the root commit of a branch
is earlier than the date of the parent it gets connected to. Could be yet
another clock-skew symptom, or might point to an error in the program’s
topological analysis. Examine commits near the join with
reposurgeon(1); the
branch may need to be reparented by hand.
more than one delta with number X.Y.Z
The CVS history contained duplicate file delta numbers.
Should never happen, and may indice a corrupted CVS archive if it does; check
head.
{revision|patch} with odd depth
Should never happen; only branch numbers are supposed to
have odd depth, not file delta or patch numbers. May indicate a corrupted CVS
archive; check head.
duplicate tag in CVS master, ignoring
A CVS master has multiple instances of the same tag
pointing at different file deltas. Probably a CVS operator error and
relatively harmless, but check that the tag’s referent in the
conversion makes sense.
tag or branch name was empty after sanitization
Fatal error: tag name was empty after all characters
illegal for git were removed. Probably indicates a corrupted RCS file.
revision number too long, increase CVS_MAX_DEPTH
Fatal error: internal buffers are too short to handle a
CVS revision in a repo. Increase this constant in cvs.h and rebuild. Warning:
this will increase memory usage and slow down the tests a lot.
snapshot sequence number too large, widen serial_t
Fatal error: the number of file snapshots in the CVS repo
overruns an internal counter. Rebuild cvs-fast-export from source with a wider
serial_t patched into cvs.h. Warning: this will significantly increase the
working-set size
too many branches, widen branchcount_t
Fatal error: the number of branches descended from some
single commit overruns an internal counter. Rebuild cvs-fast-export from
source with a wider branchcount_t patched into cvs.h. Warning: this will
significantly increase the working-set size
corrupt delta in
The text of a delta is expected to be led with d (delete)
and a (append) lines describing line-oriented changes at that delta. When you
see this message, these are garbled.
edit script tried to delete beyond eof
Indicates a corrupted RCS file. An edit line count was
wrong, possibly due to an integer overflow in an old 32-bit version of
RCS.
internal error - branch cycle
cvs-fast-export found a cycle while topologically sorting
commits by parent link. This should never happen and probably indicates a
serious internal error: please file a bug report.
internal error - lost tag
Late in analysis (after changeset coalescence) a tag lost
its commit reference. This should never happen and probably indicates an
internal error: please file a bug report.