Summary: migrate all Git storage to the new gitaly-01
back-end, each
Git repository read-only during its migration, in the coming week.
Proposal
Move all Git repositories to the new Gitaly server during Week 29,
progressively, which means it will be impossible to push new commits
to a repository while it is migrated.
This should be a series of short (seconds to minutes), scoped outage,
as each repository is marked as read-only one at a time when it’s
migrated, see “impact” below on what that means more precisely.
The Gitaly migration procedure seems well test and robust, as each
repository is checkedsummed before and after migration.
We are hoping this will improve overall performance on the GitLab
server, and is part of the design upstream GitLab suggests in scaling
an installation of our size.
Affected projects
We plan on migrating the following name spaces in order:
alpha phase, day one (2025-07-14)
This is mostly dogfooding and automation:
anarcat
(already done)tpo/tpa
tpo/web
beta phase, day two (2025-07-15)
This is to include testers outside of TPA yet on projects that are
less mission critical and could survive some issues with their Git
repositories.
tpo/community
tpo/onion-services
tpo/anti-censorship
tpo/network-health
production phase, day two or three (2025-07-15+)
This is essentially all remaining projects:
tpo/core
(includes c-tor and Arti!)tpo/applications
(includes Tor Browser and Mullvad Browser)- all remaining projects
Objections and exceptions
If you do not want any such disruption in your project, please let us
know before the deadline (2025-07-15) so we can skip your project. But
we would rather migrate all projects off of the server to simplify
the architecture and better understand the impact of the change.
We would like, in particular, to migrate all of tpo/applications
repositories in the coming week.
Inversely, if you want your project to be prioritized (it might mean a
performance improvement!), let us know and you can jump the queue!
Impact
Projects read-only during migration
While a project is migrated, it is “read-only”, that is no change can
be done to the Git repository.
We believe that other features in projects (like issues and comments)
should still work, but the upstream documentation on this is not
exactly clear:
To ensure data integrity, projects are put in a temporary read-only
state for the duration of the move. During this time, users receive
a The repository is temporarily read-only. Please try again
later. message if they try to push new commits.
So far our test migrations have been so fast (a couple of seconds per
project) that we have not really been able to test this properly.
Effectively, we don’t expect users to actually notice this
migration. In our tests, a 120MB repository was migrated in a couple
of seconds, so apart from very large repositories, most read-only
situations should be limited to less than a minute.
It is estimated that our largest repositories (the Firefox forks) will
take a 5 to 10 minutes to migrate, and that the entire migration will
take, in total, less than 2 hours to shift between the two servers
if it would performed in one shot.
Additional complexity for TPA
TPA will need to get familiar with this new service. Installation
documentation is available and all the code developed to deploy the
service is visible in an internal merge request.
I understand this is a big change right before going on vacation, so
any TPA member can veto this and switch to the alternative, a
partial or on-demand migration.
Timeline
We plan on starting this work on July 15th, the coming Tuesday.
Hardware
Like the current git repositories on gitlab-02
the git repositories
on gitaly-01
will be hosted on NVMe disks.
Background
GitLab has been having performance problems for a long time now. And
for almost as long, we’ve had the project to “scale GitLab to 2,000
users” (tpo/tpa/team#40479). And while we believe bots (and now,
in particular Large Language Models (LLM) bot nets) are responsible
for a lot of that load, our last performance incident concluded
by observing that there seems to be a correlation between real usage
and performance issues.
Indeed, during the July break, GitLab’s performance was stellar and,
on Monday, as soon as Europe woke up from the break, GitLab’s
performance collapsed again. And while it’s possible that bots are
driven by the same schedule as Tor people, we now feel it’s simply
time to scale the resources associated with one of our most important
services.
Gitaly is GitLab’s implementation of a Git server. It’s basically a
web interface to translate (GRPC) requests into Git. It’s currently
running on the same server as the main GitLab app, but a new server
has been built. New servers could be built as needed as well.
Anarcat performed benchmarks showing equivalent or better
performance of the new Gitaly server, even when influenced by the load
of the current GitLab server. It is expected the new server should
reduce the load on the main GitLab server, but it’s not clear by how
much just yet.
We’re hoping this new architecture will give us more flexibility to
deploy new such backends in the future and isolate performance issues
to improve diagnostics. It’s part of the normal roadmap in scaling a
large GitLab installation such as ours.
Alternatives considered
Full read-only backups
We have considered performing a full backup of the entire git
repositories before the migration. Unfortunately, this would require
setting a read-only mode on all of GitLab for the duration of the
backup which, according to our test, could take anywhere from 20 to 60
minutes, which seemed like an unacceptable downtime.
Note that we have nightly backups of the GitLab server of course,
which is also backed by RAID-10 disk arrays on two different
servers. We’re only talking about a fully-consistent Git backup here,
our normal backups (which, rarely, can be inconsistent and require
manual work to reconnect some refs) are typically sufficient
anyways. See tpo/tpa/team#40518 for a discussion on GitLab
backups.
Partial or on-demand migration
We have also considered doing a more piecemeal approach and just
migrating some repositories. We worry that this approach would lead to
confusion about the real impact of the migration.
Still, if any TPA member feels strongly enough about this to put a
veto on this proposal, we can take this path and instead migrate a few
repositories instead.
We could, for example, migrate only the “alpha” targets and a few key
repositories in the tpo/applications
and tpo/core
groups (since
they’re prime crawler targets), and leave the mass migration to a
later time, with a longer test period.
References and discussions
See the discussion issue for comments and more background.