NAME¶
git-maintenance - Run tasks to optimize Git repository data
SYNOPSIS¶
git maintenance run [<options>]
DESCRIPTION¶
Run tasks to optimize Git repository data, speeding up other Git
commands and reducing storage requirements for the repository.
Git commands that add repository data, such as git add or
git fetch, are optimized for a responsive user experience. These
commands do not take time to optimize the Git data, since such optimizations
scale with the full size of the repository while these user commands each
perform a relatively small action.
The git maintenance command provides flexibility for how to
optimize the Git repository.
SUBCOMMANDS¶
register
Initialize Git config values so any scheduled maintenance
will start running on this repository. This adds the repository to the
maintenance.repo config variable in the current user’s global
config and enables some recommended configuration values for
maintenance.<task>.schedule. The tasks that are enabled are safe
for running in the background without disrupting foreground processes.
The register subcommand will also set the
maintenance.strategy config value to incremental, if this
value is not previously set. The incremental strategy uses the
following schedule for each maintenance task:
•gc: disabled.
•commit-graph: hourly.
•prefetch: hourly.
•loose-objects: daily.
•incremental-repack: daily.
git maintenance register will also disable foreground
maintenance by setting maintenance.auto = false in the current
repository. This config setting will remain after a git maintenance
unregister command.
run
Run one or more maintenance tasks. If one or more
--task options are specified, then those tasks are run in that order.
Otherwise, the tasks are determined by which
maintenance.<task>.enabled config options are true. By default,
only maintenance.gc.enabled is true.
start
Start running maintenance on the current repository. This
performs the same config updates as the register subcommand, then
updates the background scheduler to run git maintenance run --scheduled
on an hourly basis.
stop
Halt the background maintenance schedule. The current
repository is not removed from the list of maintained repositories, in case
the background maintenance is restarted later.
unregister
Remove the current repository from background
maintenance. This only removes the repository from the configured list. It
does not stop the background maintenance processes from running.
TASKS¶
commit-graph
The commit-graph job updates the
commit-graph files incrementally, then verifies that the written data
is correct. The incremental write is safe to run alongside concurrent Git
processes since it will not expire .graph files that were in the
previous commit-graph-chain file. They will be deleted by a later run
based on the expiration delay.
prefetch
The
prefetch task updates the object directory
with the latest objects from all registered remotes. For each remote, a
git
fetch command is run. The refmap is custom to avoid updating local or
remote branches (those in
refs/heads or
refs/remotes). Instead,
the remote refs are stored in
refs/prefetch/<remote>/. Also, tags
are not updated.
This is done to avoid disrupting the remote-tracking branches. The
end users expect these refs to stay unmoved unless they initiate a fetch.
With prefetch task, however, the objects necessary to complete a later real
fetch would already be obtained, so the real fetch would go faster. In the
ideal case, it will just become an update to a bunch of remote-tracking
branches without any object transfer.
gc
Clean up unnecessary files and optimize the local
repository. "GC" stands for "garbage collection," but this
task performs many smaller tasks. This task can be expensive for large
repositories, as it repacks all Git objects into a single pack-file. It can
also be disruptive in some situations, as it deletes stale data. See
git-gc(1) for more details on garbage collection in Git.
loose-objects
The loose-objects job cleans up loose objects and
places them into pack-files. In order to prevent race conditions with
concurrent Git commands, it follows a two-step process. First, it deletes any
loose objects that already exist in a pack-file; concurrent Git processes will
examine the pack-file for the object data instead of the loose object. Second,
it creates a new pack-file (starting with "loose-") containing a
batch of loose objects. The batch size is limited to 50 thousand objects to
prevent the job from taking too long on a repository with many loose objects.
The gc task writes unreachable objects as loose objects to be cleaned
up by a later step only if they are not re-added to a pack-file; for this
reason it is not advisable to enable both the loose-objects and
gc tasks at the same time.
incremental-repack
The
incremental-repack job repacks the object
directory using the
multi-pack-index feature. In order to prevent race
conditions with concurrent Git commands, it follows a two-step process. First,
it calls
git multi-pack-index expire to delete pack-files unreferenced
by the
multi-pack-index file. Second, it calls
git multi-pack-index
repack to select several small pack-files and repack them into a bigger
one, and then update the
multi-pack-index entries that refer to the
small pack-files to refer to the new pack-file. This prepares those small
pack-files for deletion upon the next run of
git multi-pack-index
expire. The selection of the small pack-files is such that the expected
size of the big pack-file is at least the batch size; see the
--batch-size option for the
repack subcommand in
git-multi-pack-index(1). The default batch-size is zero, which is a
special case that attempts to repack all pack-files into a single
pack-file.
OPTIONS¶
--auto
When combined with the run subcommand, run
maintenance tasks only if certain thresholds are met. For example, the
gc task runs when the number of loose objects exceeds the number stored
in the gc.auto config setting, or when the number of pack-files exceeds
the gc.autoPackLimit config setting. Not compatible with the
--schedule option.
--schedule
When combined with the run subcommand, run
maintenance tasks only if certain time conditions are met, as specified by the
maintenance.<task>.schedule config value for each
<task>. This config value specifies a number of seconds since the
last time that task ran, according to the
maintenance.<task>.lastRun config value. The tasks that are
tested are those provided by the --task=<task> option(s) or those
with maintenance.<task>.enabled set to true.
--quiet
Do not report progress or other information over
stderr.
--task=<task>
If this option is specified one or more times, then only
run the specified tasks in the specified order. If no
--task=<task> arguments are specified, then only the tasks with
maintenance.<task>.enabled configured as true are
considered. See the TASKS section for the list of accepted
<task> values.
TROUBLESHOOTING¶
The git maintenance command is designed to simplify the
repository maintenance patterns while minimizing user wait time during Git
commands. A variety of configuration options are available to allow
customizing this process. The default maintenance options focus on
operations that complete quickly, even on large repositories.
Users may find some cases where scheduled maintenance tasks do not
run as frequently as intended. Each git maintenance run command takes
a lock on the repository’s object database, and this prevents other
concurrent git maintenance run commands from running on the same
repository. Without this safeguard, competing processes could leave the
repository in an unpredictable state.
The background maintenance schedule runs git maintenance
run processes on an hourly basis. Each run executes the
"hourly" tasks. At midnight, that process also executes the
"daily" tasks. At midnight on the first day of the week, that
process also executes the "weekly" tasks. A single process
iterates over each registered repository, performing the scheduled tasks for
that frequency. Depending on the number of registered repositories and their
sizes, this process may take longer than an hour. In this case, multiple
git maintenance run commands may run on the same repository at the
same time, colliding on the object database lock. This results in one of the
two tasks not running.
If you find that some maintenance windows are taking longer than
one hour to complete, then consider reducing the complexity of your
maintenance tasks. For example, the gc task is much slower than the
incremental-repack task. However, this comes at a cost of a slightly
larger object database. Consider moving more expensive tasks to be run less
frequently.
Expert users may consider scheduling their own maintenance tasks
using a different schedule than is available through git maintenance
start and Git configuration options. These users should be aware of the
object database lock and how concurrent git maintenance run commands
behave. Further, the git gc command should not be combined with
git maintenance run commands. git gc modifies the object
database but does not take the lock in the same way as git maintenance
run. If possible, use git maintenance run --task=gc instead of
git gc.