[tor-project] gitolite to gitlab migration completed (TPA-RFC-36)

Hi again everyone!

This is the last update of the Gitolite migration. It's a little more
detailed than previous updates, so I made it in a blog post:

... which I attach a copy here if you're the kind of people who prefer
to read email than web. :slight_smile:

Enjoy!

Edit: note that the blog post is also discussed here, on Discourse:

···

----

Tor has finally completed a long migration from legacy Git
infrastructure ([Gitolite and GitWeb]) to our self-hosted
[GitLab] server.

[GitLab]: gitlab · Wiki · The Tor Project / TPA / TPA team · GitLab
[Gitolite and GitWeb]: git · Wiki · The Tor Project / TPA / TPA team · GitLab

Git repository addresses have therefore changed. Many of you probably
have made the switch already, but if not, you will need to change:

    https://git.torproject.org/

to:

    https://gitlab.torproject.org/

In your Git configuration.

The [GitWeb front page] is now an archived listing of all the
repositories before the migration. Inactive git repositories were
archived in GitLab [legacy/gitolite namespace] and the
`gitweb.torproject.org` and `git.torproject.org` web sites now
redirect to GitLab.

[legacy/gitolite namespace]: gitolite · GitLab
[GitWeb front page]: https://gitweb.torproject.org/

Best effort was made to reproduce the original gitolite repositories
faithfully and also avoid duplicating too much data in the
migration. But it's *possible* that some data present in Gitolite has
not migrated to GitLab.

User repositories are particularly at risk, because they were
massively migrated, and they were "re-forked" from their upstreams, to
avoid wasting disk space. If a user had a project with a matching name
it was *assumed* to have the right data, which might be inaccurate.

The two virtual machines responsible for the legacy service (`cupani`
for `git-rw.torproject.org` and `vineale` for `git.torproject.org` and
`gitweb.torproject.org`) have been shutdown. Their disks will remain
for 3 months (until the end of July 2024) and their backups for
another year after that (until the end of July 2025), after which
point all the data from those hosts will be destroyed, with only the
GitLab archives remaining.

The rest of this article expands on how this was done and what kind of
problems we faced during the migration.

# Where is the code?

Normally, nothing should be lost. All repositories in gitolite have
been either explicitly migrated by their owners, forcibly migrated by
the sysadmin team ([TPA]), or explicitly destroyed at their owner's
request.

[TPA]: The Tor Project / TPA / TPA team · GitLab

An exhaustive [rewrite map] translates gitolite projects to GitLab
projects. Some of those projects actually redirect to their *parent*
in cases of empty repositories that were obvious forks. Destroyed
repositories redirect to the GitLab front page.

[rewrite map]: https://archive.torproject.org/websites/gitolite2gitlab.txt

Because the migration happened progressively, it's technically
possible that commits pushed to gitolite were lost after the
migration. We took great care to avoid that scenario. First, we
adopted a proposal ([TPA-RFC-36]) in June 2023 to announce the
transition. Then, in [March 2024], we locked down all repositories
from any further changes. Around that time, only a [handful of
repositories] had changes made after the adoption date, and we
examined each repository carefully to make sure nothing was lost.

[handful of repositories]: review gitolite retirement progress and send a reminder (#41214) · Issues · The Tor Project / TPA / TPA team · GitLab "handful of repositories"
[March 2024]: lock down legacy git infrastructure (#41213) · Issues · The Tor Project / TPA / TPA team · GitLab
[TPA-RFC-36]: tpa rfc 36 gitolite gitweb retirement · Wiki · The Tor Project / TPA / TPA team · GitLab

Still, we built a [diff of all the changes in the git references]
that archivists can peruse to check for data loss. It's large (6MiB+)
because a lot of repositories were migrated before the mass migration
and then kept evolving in GitLab. Many other repositories were rebuilt
in GitLab from parent to rebuild a fork relationship which added extra
references to those clones.

[diff of all the changes in the git references]: forcibly migrate remaining Gitolite repositories to GitLab (#41215) · Issues · The Tor Project / TPA / TPA team · GitLab

A note to amateur archivists out there, it's probably too late for one
last crawl now. The Git repositories now all redirect to GitLab and
are effectively unavailable in their original form.

That said, the GitWeb site was crawled into the [Internet Archive] [in
February 2024], so at least some copy of it is available in the
[Wayback Machine]. At that point, however, many developers had already
migrated their projects to GitLab, so the copies there were already
possibly out of date compared with the repositories in GitLab.

[Wayback Machine]: gitweb.torproject.org
[in February 2024]: retire vineale (#41218) · Issues · The Tor Project / TPA / TPA team · GitLab
[Internet Archive]: https://archive.org/

[Software Heritage] also has a copy of all repositories hosted on
Gitolite [since June 2023] and have continuously kept mirroring the
repositories, where they will be kept hopefully in eternity. There's
an [issue] where the main website can't find the repositories when
you search for `gitweb.torproject.org`, instead [search for
`git.torproject.org`].

[search for `git.torproject.org`]: Search software origins to browse – Software Heritage archive
[issue]: website can't find gitweb.torproject.org repositories even though it has been scraped (#4787) · Issues · Platform / Development / swh-web · GitLab
[since June 2023]: Add forge now - Process https://gitweb.torproject.org/ (#4939) · Issues · Platform / Infrastructure / sysadm-environment · GitLab
[Software Heritage]: https://www.softwareheritage.org/

In any case, if you believe data is missing, please do let us know by
[opening an issue with TPA].

[opening an issue with TPA]: Sign in · GitLab

# Why?

This is an old project in the making. The first [discussion about
migrating from gitolite to GitLab] started in 2020 (almost 4 years
ago). But [going further back], the first GitLab experiment was in
2016, almost a decade ago.

[going further back]: trac · Wiki · The Tor Project / TPA / TPA team · GitLab

[discussion about migrating from gitolite to GitLab]: draft TPA-RFC-36: establish policy on git repository mirroring, hosting and, ultimately migration from gitolite (#40472) · Issues · The Tor Project / TPA / TPA team · GitLab

The current GitLab server dates from 2019, [replacing Trac for issue
tracking in 2020]. It was originally supposed to host only mirrors
for merge requests and issue trackers but, naturally, one thing led to
another and eventually, GitLab had grown a container registry,
continuous integration (CI) runners, GitLab Pages, and, of course,
hosted most Git repositories.

[replacing Trac for issue tracking in 2020]: From Trac into Gitlab for Tor | The Tor Project

There were hesitations at moving to GitLab for code hosting. We had
[discussions about the increased attack surface] and [ways to
mitigate that], but, ultimately, it seems the issues were not that
serious and the community embraced GitLab.

[ways to mitigate that]: Investigate push-signing and transparency logs as mitigation for repository attacks (#98) · Issues · The Tor Project / TPA / Gitlab · GitLab
[discussions about the increased attack surface]: evaluate mitigation strategies to work around GitLab's attack surface for git hosting (#81) · Issues · The Tor Project / TPA / Gitlab · GitLab

TPA actually migrated its most critical repositories out of shared
hosting entirely, into specific servers (e.g. the Puppet Git
repository is just on the Puppet server now), leveraging Git's
decentralized nature and removing an entire attack surface from our
infrastructure. Some of those repositories are *mirrored* back into
GitLab, but the authoritative copy is not on GitLab.

In any case, the proposal to migrate from Gitolite to GitLab was
effectively just formalizing a *fait accompli*.

# How to migrate from Gitolite / cgit to GitLab

The progressive migration was a challenge. If you intend to migrate
between hosting platforms, we strongly recommend to make a "flag day"
during which you migrate *all* repositories *at once*. This ensures a
smoother transition and avoids elaborate rewrite rules.

When Gitolite access was shutdown, we had repositories on both GitLab
and Gitolite, without a clear relationship between the two. A priori,
the plan then was to import all the remaining Gitolite repositories
into the `legacy/gitolite` namespace, but that seemed wasteful,
particularly for large repositories like [Tor Browser] which uses
nearly a gigabyte of disk space. So we took special care to avoid
duplicating repositories.

[Tor Browser]: The Tor Project / Applications / Tor Browser · GitLab

When the [mass migration] started, only 71 of the 538 Gitolite
repositories were `Migrated to GitLab` in the `gitolite.conf`
file. So, given that we had *hundreds* of repositories to migrate:, we
developed some automation to "[save time]". We already automate
similar ad-hoc tasks with [Fabric], so we used that framework here
as well. (Our normal configuration management tool is [Puppet],
which is a poor fit here.)

[Puppet]: puppet · Wiki · The Tor Project / TPA / TPA team · GitLab
[Fabric]: fabric · Wiki · The Tor Project / TPA / TPA team · GitLab
[save time]: xkcd: Is It Worth the Time?
[mass migration]: forcibly migrate remaining Gitolite repositories to GitLab (#41215) · Issues · The Tor Project / TPA / TPA team · GitLab

So a relatively [large amount of Python code] was produced to
basically do the following:

[large amount of Python code]: fabric_tpa/gitolite.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab

1. check if all on-disk repositories are listed in `gitolite.conf`
    (and vice versa) and either add missing repositories or delete
    them from disk if garbage
2. for each repository in `gitolite.conf`, if its category is marked
    `Migrated to GitLab`, skip, otherwise;
3. find a matching GitLab project by name, prompt the user for
    multiple matches
4. if a match is found, redirect if the repository is non-empty
    * we have GitLab projects that *look* like the real thing, but are
    only present to host migrated Trac issues
    * in such cases we cloned the Gitolite project locally and pushed
    to the existing repository instead
5. otherwise, a new repository is created in the `legacy/gitolite`
    namespace, using the "import" mechanism in GitLab to automatically
    import the repository from Gitolite, creating redirections and
    updating `gitolite.conf` to document the change

User repositories (those under the `user/` directory in Gitolite) were
handled specially. First, the existing redirection map was checked to
see if a similarly named project was migrated (so that,
e.g. `user/dgoulet/tor` is properly treated as a fork of
`tpo/core/tor`). Then the parent project was forked in GitLab and the
Gitolite project force-pushed to the fork. This allows us to show the
fork relationship in GitLab and, more importantly, benefit from the
"pool" feature in GitLab which deduplicates disk usage between forks.

Sometimes, we found no such relationships. Then we simply imported
multiple repositories with similar names in the `legacy/gitolite`
namespace, sometimes creating forks between user repositories, on a
first-come-first-served basis from the `gitolite.conf` order.

The code used in this migration is now available publicly. We
encourage other groups planning to migrate from Gitolite/GitWeb to
GitLab to use (and contribute to) our [fabric-tasks] repository,
even though it does have its fair share of hard-coded assertions.

[fabric-tasks]: The Tor Project / TPA / Fabric Tasks · GitLab

The main entry point is the `gitolite.mass-repos-migration` task. A
typical migration job looked like:

anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration 
[...]
INFO: skipping project project/help/infra in category Migrated to GitLab
INFO: skipping project project/help/wiki in category Migrated to GitLab
INFO: skipping project project/jenkins/jobs in category Migrated to GitLab
INFO: skipping project project/jenkins/tools in category Migrated to GitLab
INFO: searching for projects matching fastlane
INFO: Successfully connected to https://gitlab.torproject.org
import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] 
INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'
INFO: building a new connect to cupani
INFO: defaulting name to fastlane
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/project/tor-browser
INFO: archiving project
INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane
INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane
INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive
INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab
INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab
[...]

In the above, you can see migrated repositories skipped then the
[fastlane project] being archived into GitLab. Another example with
a later version of the script, processing only user repositories and
showing the interactive prompt and a force-push into a fork:

[fastlane project]: Legacy / gitolite / project / tor-browser / fastlane · GitLab

$ fab -H cupani.torproject.org  gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'
INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab
[...]
INFO: skipping project user/phw/atlas in category Migrated to GitLab
INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)
INFO: Successfully connected to https://gitlab.torproject.org
INFO: user repository detected, trying to find fork phw/obfsproxy
WARNING: no existing fork found, entering user fork subroutine
INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)
0 legacy/gitolite/debian/obfsproxy
1 legacy/gitolite/debian/obfsproxy-legacy
2 legacy/gitolite/user/asn/obfsproxy
3 legacy/gitolite/user/ioerror/obfsproxy
4 tpo/anti-censorship/pluggable-transports/obfsproxy
5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy
select parent to fork from, or enter to abort: ^G4
INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414
fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] 
INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy
INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw
INFO: waiting for fork to complete...
INFO: fork status: started, sleeping...
INFO: fork finished
INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy
INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> => {'id': 2723, 'name': 'master', 'push_access_levels': [{'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None}], 'merge_access_levels': [{'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers'}], 'allow_force_push': False}
INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy
Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...
INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
remote: 
remote: To create a merge request for bug_10887, visit:        
remote:   https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887        
remote: 
[...]
To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
 + 2bf9d09...a8e54d5 master -> master (forced update)
 * [new branch]      bug_10887 -> bug_10887
[...]
INFO: migrating repo
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab
INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)
INFO: user repository detected, trying to find fork phw/scramblesuit
WARNING: no existing fork found, entering user fork subroutine
WARNING: no matching gitlab project found for user/phw/scramblesuit
INFO: user fork subroutine failed, resuming normal procedure
INFO: searching for projects matching scramblesuit
import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] 
INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists
INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/user/phw
INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: archiving project
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab
[...]

Acute eyes will notice the [bell used as a notification mechanism]
as well in this transcript.

[bell used as a notification mechanism]: Using the bell as modern notification - anarcat

A lot of the code is now useless for us, but some, like "commit and
push" or [`is-repo-empty`] live on in the [git module] and, of
course, the [gitlab module] has grown some legs along the
way. We've also found fun bugs, like a [file descriptor exhaustion in
bash], among other oddities. The [retirement milestone] and
[issue 41215] has a detailed log of the migration, for those
curious.

[issue 41215]: forcibly migrate remaining Gitolite repositories to GitLab (#41215) · Issues · The Tor Project / TPA / TPA team · GitLab
[retirement milestone]: legacy Git infrastructure retirement (TPA-RFC-36) · TPA · GitLab
[file descriptor exhaustion in bash]: #642504 - bash: file number exhaustion on certain redirections in loop: "Too many open files" - Debian Bug report logs
[gitlab module]: fabric_tpa/gitlab.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab
[git module]: fabric_tpa/git.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab
[`is-repo-empty`]: fabric_tpa/git.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab

This was a challenging project, but it feels nice to have this behind
us. This gets rid of 2 of the 4 remaining machines running Debian
"old-old-stable", which moves a bit further ahead in our late
[bullseye upgrades milestone].

[bullseye upgrades milestone]: Debian 11 bullseye upgrade · TPA · GitLab

Full transparency: we tested GPT-3.5, GPT-4, and other large language
models to see if they could answer the question "write a set of
rewrite rules to redirect GitWeb to GitLab". This has become a
standard LLM test for your faithful writer to figure out how good a
LLM is at technical responses. None of them gave an accurate,
complete, and functional response, for the record.

The actual rewrite rules as of this writing follow, for humans that
actually like working answers provided by expert humans instead of
artificial intelligence which currently seem to be, glorified,
mansplaining interns.

## git.torproject.org rewrite rules

Those rules are relatively simple in that they rewrite a single URL to
its equivalent GitLab counterpart in a 1:1 fashion. It relies on the
[rewrite map] mentioned above, of course.

[rewrite map]: https://archive.torproject.org/websites/gitolite2gitlab.txt

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# if this becomes a performance bottleneck, convert to a DBM map with:
#
#  $ httxt2dbm -i mapfile.txt -o mapfile.map
#
# and:
#
# RewriteMap mapname "dbm:/etc/apache/mapfile.map"
#
# according to reports lavamind found online, we hit such a
# performance bottleneck only around millions of entries, which is not our case

# those two rules can go away once all the projects are
# migrated to GitLab
#
# this matches the request URI so we can check the RewriteMap
# for a match next
#
# WARNING: this won't match URLs without .git in them, which
# *do* work now. one possibility would be to match the request
# URI (without query string!) with:
#
# /git/(.*)(.git)?/(((branches|hooks|info|objects/).*)|git-.*|upload-pack|receive-pack|HEAD|config|description)?.
#
# I haven't been able to figure out the actual structure of
# those URLs, so it's really hard to figure out the boundaries
# of the project name here. I stopped after pouring around the
# http-backend.c code in git
# itself. https://www.git-scm.com/docs/http-protocol is also
# kind of incomplete and unsatisfying.
RewriteCond %{REQUEST_URI} ^/(git/)?(.*).git/.*$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond ${gitolite2gitlab:%2|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$2}.git/$3 [R=302,L]

# Fallback everything else to GitLab
RewriteRule (.*) https://gitlab.torproject.org [R=302,L]

## gitweb.torproject.org rewrite rules

Those are the vastly more complicated GitWeb to GitLab rewrite
rules.

Note that we say "GitWeb" but we were actually *not* running
[GitWeb] but [cgit], as the former didn't actually scale for us.

[cgit]: cgit - A hyperfast web frontend for git repositories written in C.
[GitWeb]: Git - gitweb Documentation

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"

# special rule to process targets of the old spec.tpo site and
# bring them to the right redirect on the new spec.tpo site. that should turn, for example:
#
# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt
#
# into:
#
# https://spec.torproject.org/address-spec
RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]

# list of endpoints taken from cgit's cmd.c

# those two RewriteCond are necessary because we don't move
# all repositories at once. once the migration is completed,
# they can be removed.
#
# and yes, they are copied all over the place below
#
# create a match for the project name to check if the project
# has been moved to GitLab
RewriteCond %{REQUEST_URI} ^/(.*).git(/.*)?$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
# main project page, like summary below
RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]

# summary
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]

# about
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]

# commit
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))id=([^&]*)(&.*)?$"
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]

# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1 [R=302,L,QSD]

# patch
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.patch [R=302,L,QSD]

# rawdiff, incomplete because can show only one file diff, which GitLab cannot
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.diff [R=302,L,QSD]

# log
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD$2 [R=302,L]

# atom
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L,QSD]

# refs, incomplete because two pages in GitLab, defaulting to "tags"
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags [R=302,L]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags/%1 [R=302,L,QSD]

# tree
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/%1$2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD$2 [R=302,L]

# /-/tree has no good default in GitLab, revert to HEAD which is a good
# approximation (we can't assume "master" here anymore)
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD [R=302,L]

# plain
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/%1$2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/HEAD$2 [R=302,L]

# blame: disabled
#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
#RewriteCond %{QUERY_STRING} h=([^&]*)
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/%1$2 [R=302,L,QSD]
# same default as tree above
#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/HEAD/$2 [R=302,L]

# stats
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/graphs/HEAD [R=302,L]

# still TODO:
# repolist: once migration is complete
#
# cannot be done:
# atom: needs a feed token, user must be logged in
# blob: no direct equivalent
# info: not working on main cgit website?
# ls_cache: not working, irrelevant?
# objects: undocumented?
# snapshot: pattern too hard to match on cgit's side

# special case, we keep a copy of the main index on the archive
RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]
# Fallback: everything else to GitLab
RewriteRule .* https://gitlab.torproject.org [R=302,L]

The reference copy of those is available in our (currently private)
Puppet git repository.

--
Antoine Beaupré
torproject.org system administration

Congratulations on reaching this milestone!

···

On 5/1/24 8:16 AM, Antoine Beaupré wrote:

Hi again everyone!

This is the last update of the Gitolite migration. It's a little more
detailed than previous updates, so I made it in a blog post:

Tor migrates from Gitolite/GitWeb to GitLab | The Tor Project

... which I attach a copy here if you're the kind of people who prefer
to read email than web. :slight_smile:

Enjoy!

----

Tor has finally completed a long migration from legacy Git
infrastructure ([Gitolite and GitWeb]) to our self-hosted
[GitLab] server.

  [GitLab]: gitlab · Wiki · The Tor Project / TPA / TPA team · GitLab
  [Gitolite and GitWeb]: git · Wiki · The Tor Project / TPA / TPA team · GitLab

Git repository addresses have therefore changed. Many of you probably
have made the switch already, but if not, you will need to change:

     https://git.torproject.org/

to:

     https://gitlab.torproject.org/

In your Git configuration.

The [GitWeb front page] is now an archived listing of all the
repositories before the migration. Inactive git repositories were
archived in GitLab [legacy/gitolite namespace] and the
`gitweb.torproject.org` and `git.torproject.org` web sites now
redirect to GitLab.

  [legacy/gitolite namespace]: gitolite · GitLab
  [GitWeb front page]: https://gitweb.torproject.org/

Best effort was made to reproduce the original gitolite repositories
faithfully and also avoid duplicating too much data in the
migration. But it's *possible* that some data present in Gitolite has
not migrated to GitLab.

User repositories are particularly at risk, because they were
massively migrated, and they were "re-forked" from their upstreams, to
avoid wasting disk space. If a user had a project with a matching name
it was *assumed* to have the right data, which might be inaccurate.

The two virtual machines responsible for the legacy service (`cupani`
for `git-rw.torproject.org` and `vineale` for `git.torproject.org` and
`gitweb.torproject.org`) have been shutdown. Their disks will remain
for 3 months (until the end of July 2024) and their backups for
another year after that (until the end of July 2025), after which
point all the data from those hosts will be destroyed, with only the
GitLab archives remaining.

The rest of this article expands on how this was done and what kind of
problems we faced during the migration.

# Where is the code?

Normally, nothing should be lost. All repositories in gitolite have
been either explicitly migrated by their owners, forcibly migrated by
the sysadmin team ([TPA]), or explicitly destroyed at their owner's
request.

  [TPA]: The Tor Project / TPA / TPA team · GitLab

An exhaustive [rewrite map] translates gitolite projects to GitLab
projects. Some of those projects actually redirect to their *parent*
in cases of empty repositories that were obvious forks. Destroyed
repositories redirect to the GitLab front page.

  [rewrite map]: https://archive.torproject.org/websites/gitolite2gitlab.txt

Because the migration happened progressively, it's technically
possible that commits pushed to gitolite were lost after the
migration. We took great care to avoid that scenario. First, we
adopted a proposal ([TPA-RFC-36]) in June 2023 to announce the
transition. Then, in [March 2024], we locked down all repositories
from any further changes. Around that time, only a [handful of
repositories] had changes made after the adoption date, and we
examined each repository carefully to make sure nothing was lost.

  [handful of repositories]: review gitolite retirement progress and send a reminder (#41214) · Issues · The Tor Project / TPA / TPA team · GitLab "handful of repositories"
  [March 2024]: lock down legacy git infrastructure (#41213) · Issues · The Tor Project / TPA / TPA team · GitLab
  [TPA-RFC-36]: tpa rfc 36 gitolite gitweb retirement · Wiki · The Tor Project / TPA / TPA team · GitLab

Still, we built a [diff of all the changes in the git references]
that archivists can peruse to check for data loss. It's large (6MiB+)
because a lot of repositories were migrated before the mass migration
and then kept evolving in GitLab. Many other repositories were rebuilt
in GitLab from parent to rebuild a fork relationship which added extra
references to those clones.

  [diff of all the changes in the git references]: forcibly migrate remaining Gitolite repositories to GitLab (#41215) · Issues · The Tor Project / TPA / TPA team · GitLab

A note to amateur archivists out there, it's probably too late for one
last crawl now. The Git repositories now all redirect to GitLab and
are effectively unavailable in their original form.

That said, the GitWeb site was crawled into the [Internet Archive] [in
February 2024], so at least some copy of it is available in the
[Wayback Machine]. At that point, however, many developers had already
migrated their projects to GitLab, so the copies there were already
possibly out of date compared with the repositories in GitLab.

  [Wayback Machine]: gitweb.torproject.org
  [in February 2024]: retire vineale (#41218) · Issues · The Tor Project / TPA / TPA team · GitLab
  [Internet Archive]: https://archive.org/

[Software Heritage] also has a copy of all repositories hosted on
Gitolite [since June 2023] and have continuously kept mirroring the
repositories, where they will be kept hopefully in eternity. There's
an [issue] where the main website can't find the repositories when
you search for `gitweb.torproject.org`, instead [search for
`git.torproject.org`].

  [search for `git.torproject.org`]: Search software origins to browse – Software Heritage archive
  [issue]: website can't find gitweb.torproject.org repositories even though it has been scraped (#4787) · Issues · Platform / Development / swh-web · GitLab
  [since June 2023]: Add forge now - Process https://gitweb.torproject.org/ (#4939) · Issues · Platform / Infrastructure / sysadm-environment · GitLab
  [Software Heritage]: https://www.softwareheritage.org/

In any case, if you believe data is missing, please do let us know by
[opening an issue with TPA].

  [opening an issue with TPA]: Sign in · GitLab

# Why?

This is an old project in the making. The first [discussion about
migrating from gitolite to GitLab] started in 2020 (almost 4 years
ago). But [going further back], the first GitLab experiment was in
2016, almost a decade ago.

  [going further back]: trac · Wiki · The Tor Project / TPA / TPA team · GitLab

  [discussion about migrating from gitolite to GitLab]: draft TPA-RFC-36: establish policy on git repository mirroring, hosting and, ultimately migration from gitolite (#40472) · Issues · The Tor Project / TPA / TPA team · GitLab

The current GitLab server dates from 2019, [replacing Trac for issue
tracking in 2020]. It was originally supposed to host only mirrors
for merge requests and issue trackers but, naturally, one thing led to
another and eventually, GitLab had grown a container registry,
continuous integration (CI) runners, GitLab Pages, and, of course,
hosted most Git repositories.

  [replacing Trac for issue tracking in 2020]: From Trac into Gitlab for Tor | The Tor Project

There were hesitations at moving to GitLab for code hosting. We had
[discussions about the increased attack surface] and [ways to
mitigate that], but, ultimately, it seems the issues were not that
serious and the community embraced GitLab.

  [ways to mitigate that]: Investigate push-signing and transparency logs as mitigation for repository attacks (#98) · Issues · The Tor Project / TPA / Gitlab · GitLab
  [discussions about the increased attack surface]: evaluate mitigation strategies to work around GitLab's attack surface for git hosting (#81) · Issues · The Tor Project / TPA / Gitlab · GitLab

TPA actually migrated its most critical repositories out of shared
hosting entirely, into specific servers (e.g. the Puppet Git
repository is just on the Puppet server now), leveraging Git's
decentralized nature and removing an entire attack surface from our
infrastructure. Some of those repositories are *mirrored* back into
GitLab, but the authoritative copy is not on GitLab.

In any case, the proposal to migrate from Gitolite to GitLab was
effectively just formalizing a *fait accompli*.

# How to migrate from Gitolite / cgit to GitLab

The progressive migration was a challenge. If you intend to migrate
between hosting platforms, we strongly recommend to make a "flag day"
during which you migrate *all* repositories *at once*. This ensures a
smoother transition and avoids elaborate rewrite rules.

When Gitolite access was shutdown, we had repositories on both GitLab
and Gitolite, without a clear relationship between the two. A priori,
the plan then was to import all the remaining Gitolite repositories
into the `legacy/gitolite` namespace, but that seemed wasteful,
particularly for large repositories like [Tor Browser] which uses
nearly a gigabyte of disk space. So we took special care to avoid
duplicating repositories.

  [Tor Browser]: The Tor Project / Applications / Tor Browser · GitLab

When the [mass migration] started, only 71 of the 538 Gitolite
repositories were `Migrated to GitLab` in the `gitolite.conf`
file. So, given that we had *hundreds* of repositories to migrate:, we
developed some automation to "[save time]". We already automate
similar ad-hoc tasks with [Fabric], so we used that framework here
as well. (Our normal configuration management tool is [Puppet],
which is a poor fit here.)

  [Puppet]: puppet · Wiki · The Tor Project / TPA / TPA team · GitLab
  [Fabric]: fabric · Wiki · The Tor Project / TPA / TPA team · GitLab
  [save time]: xkcd: Is It Worth the Time?
  [mass migration]: forcibly migrate remaining Gitolite repositories to GitLab (#41215) · Issues · The Tor Project / TPA / TPA team · GitLab

So a relatively [large amount of Python code] was produced to
basically do the following:

  [large amount of Python code]: fabric_tpa/gitolite.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab

  1. check if all on-disk repositories are listed in `gitolite.conf`
     (and vice versa) and either add missing repositories or delete
     them from disk if garbage
  2. for each repository in `gitolite.conf`, if its category is marked
     `Migrated to GitLab`, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for
     multiple matches
  4. if a match is found, redirect if the repository is non-empty
     * we have GitLab projects that *look* like the real thing, but are
     only present to host migrated Trac issues
     * in such cases we cloned the Gitolite project locally and pushed
     to the existing repository instead
  5. otherwise, a new repository is created in the `legacy/gitolite`
     namespace, using the "import" mechanism in GitLab to automatically
     import the repository from Gitolite, creating redirections and
     updating `gitolite.conf` to document the change

User repositories (those under the `user/` directory in Gitolite) were
handled specially. First, the existing redirection map was checked to
see if a similarly named project was migrated (so that,
e.g. `user/dgoulet/tor` is properly treated as a fork of
`tpo/core/tor`). Then the parent project was forked in GitLab and the
Gitolite project force-pushed to the fork. This allows us to show the
fork relationship in GitLab and, more importantly, benefit from the
"pool" feature in GitLab which deduplicates disk usage between forks.

Sometimes, we found no such relationships. Then we simply imported
multiple repositories with similar names in the `legacy/gitolite`
namespace, sometimes creating forks between user repositories, on a
first-come-first-served basis from the `gitolite.conf` order.

The code used in this migration is now available publicly. We
encourage other groups planning to migrate from Gitolite/GitWeb to
GitLab to use (and contribute to) our [fabric-tasks] repository,
even though it does have its fair share of hard-coded assertions.

  [fabric-tasks]: The Tor Project / TPA / Fabric Tasks · GitLab

The main entry point is the `gitolite.mass-repos-migration` task. A
typical migration job looked like:

anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration
[...]
INFO: skipping project project/help/infra in category Migrated to GitLab
INFO: skipping project project/help/wiki in category Migrated to GitLab
INFO: skipping project project/jenkins/jobs in category Migrated to GitLab
INFO: skipping project project/jenkins/tools in category Migrated to GitLab
INFO: searching for projects matching fastlane
INFO: Successfully connected to https://gitlab.torproject.org
import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n]
INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'
INFO: building a new connect to cupani
INFO: defaulting name to fastlane
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/project/tor-browser
INFO: archiving project
INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane
INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane
INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive
INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab
INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab
[...]

In the above, you can see migrated repositories skipped then the
[fastlane project] being archived into GitLab. Another example with
a later version of the script, processing only user repositories and
showing the interactive prompt and a force-push into a fork:

  [fastlane project]: Legacy / gitolite / project / tor-browser / fastlane · GitLab

$ fab -H cupani.torproject.org  gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'
INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab
[...]
INFO: skipping project user/phw/atlas in category Migrated to GitLab
INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)
INFO: Successfully connected to https://gitlab.torproject.org
INFO: user repository detected, trying to find fork phw/obfsproxy
WARNING: no existing fork found, entering user fork subroutine
INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)
0 legacy/gitolite/debian/obfsproxy
1 legacy/gitolite/debian/obfsproxy-legacy
2 legacy/gitolite/user/asn/obfsproxy
3 legacy/gitolite/user/ioerror/obfsproxy
4 tpo/anti-censorship/pluggable-transports/obfsproxy
5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy
select parent to fork from, or enter to abort: ^G4
INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414
fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n]
INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy
INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw
INFO: waiting for fork to complete...
INFO: fork status: started, sleeping...
INFO: fork finished
INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy
INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> => {'id': 2723, 'name': 'master', 'push_access_levels': [{'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None}], 'merge_access_levels': [{'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers'}], 'allow_force_push': False}
INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy
Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...
INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
remote:
remote: To create a merge request for bug_10887, visit:
remote:   https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887
remote:
[...]
To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
  + 2bf9d09...a8e54d5 master -> master (forced update)
  * [new branch]      bug_10887 -> bug_10887
[...]
INFO: migrating repo
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab
INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)
INFO: user repository detected, trying to find fork phw/scramblesuit
WARNING: no existing fork found, entering user fork subroutine
WARNING: no matching gitlab project found for user/phw/scramblesuit
INFO: user fork subroutine failed, resuming normal procedure
INFO: searching for projects matching scramblesuit
import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n]
INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists
INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/user/phw
INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: archiving project
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab
[...]

Acute eyes will notice the [bell used as a notification mechanism]
as well in this transcript.

  [bell used as a notification mechanism]: Using the bell as modern notification - anarcat

A lot of the code is now useless for us, but some, like "commit and
push" or [`is-repo-empty`] live on in the [git module] and, of
course, the [gitlab module] has grown some legs along the
way. We've also found fun bugs, like a [file descriptor exhaustion in
bash], among other oddities. The [retirement milestone] and
[issue 41215] has a detailed log of the migration, for those
curious.

  [issue 41215]: forcibly migrate remaining Gitolite repositories to GitLab (#41215) · Issues · The Tor Project / TPA / TPA team · GitLab
  [retirement milestone]: legacy Git infrastructure retirement (TPA-RFC-36) · TPA · GitLab
  [file descriptor exhaustion in bash]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642504
  [gitlab module]: fabric_tpa/gitlab.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab
  [git module]: fabric_tpa/git.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab
  [`is-repo-empty`]: fabric_tpa/git.py · 85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76 · The Tor Project / TPA / Fabric Tasks · GitLab

This was a challenging project, but it feels nice to have this behind
us. This gets rid of 2 of the 4 remaining machines running Debian
"old-old-stable", which moves a bit further ahead in our late
[bullseye upgrades milestone].

  [bullseye upgrades milestone]: Debian 11 bullseye upgrade · TPA · GitLab

Full transparency: we tested GPT-3.5, GPT-4, and other large language
models to see if they could answer the question "write a set of
rewrite rules to redirect GitWeb to GitLab". This has become a
standard LLM test for your faithful writer to figure out how good a
LLM is at technical responses. None of them gave an accurate,
complete, and functional response, for the record.

The actual rewrite rules as of this writing follow, for humans that
actually like working answers provided by expert humans instead of
artificial intelligence which currently seem to be, glorified,
mansplaining interns.

## git.torproject.org rewrite rules

Those rules are relatively simple in that they rewrite a single URL to
its equivalent GitLab counterpart in a 1:1 fashion. It relies on the
[rewrite map] mentioned above, of course.

  [rewrite map]: https://archive.torproject.org/websites/gitolite2gitlab.txt

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# if this becomes a performance bottleneck, convert to a DBM map with:
#
#  $ httxt2dbm -i mapfile.txt -o mapfile.map
#
# and:
#
# RewriteMap mapname "dbm:/etc/apache/mapfile.map"
#
# according to reports lavamind found online, we hit such a
# performance bottleneck only around millions of entries, which is not our case

# those two rules can go away once all the projects are
# migrated to GitLab
#
# this matches the request URI so we can check the RewriteMap
# for a match next
#
# WARNING: this won't match URLs without .git in them, which
# *do* work now. one possibility would be to match the request
# URI (without query string!) with:
#
# /git/(.*)(.git)?/(((branches|hooks|info|objects/).*)|git-.*|upload-pack|receive-pack|HEAD|config|description)?.
#
# I haven't been able to figure out the actual structure of
# those URLs, so it's really hard to figure out the boundaries
# of the project name here. I stopped after pouring around the
# http-backend.c code in git
# itself. https://www.git-scm.com/docs/http-protocol is also
# kind of incomplete and unsatisfying.
RewriteCond %{REQUEST_URI} ^/(git/)?(.*).git/.*$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond ${gitolite2gitlab:%2|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$2}.git/$3 [R=302,L]

# Fallback everything else to GitLab
RewriteRule (.*) https://gitlab.torproject.org [R=302,L]

## gitweb.torproject.org rewrite rules

Those are the vastly more complicated GitWeb to GitLab rewrite
rules.

Note that we say "GitWeb" but we were actually *not* running
[GitWeb] but [cgit], as the former didn't actually scale for us.

  [cgit]: cgit - A hyperfast web frontend for git repositories written in C.
  [GitWeb]: Git - gitweb Documentation

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"

# special rule to process targets of the old spec.tpo site and
# bring them to the right redirect on the new spec.tpo site. that should turn, for example:
#
# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt
#
# into:
#
# https://spec.torproject.org/address-spec
RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]

# list of endpoints taken from cgit's cmd.c

# those two RewriteCond are necessary because we don't move
# all repositories at once. once the migration is completed,
# they can be removed.
#
# and yes, they are copied all over the place below
#
# create a match for the project name to check if the project
# has been moved to GitLab
RewriteCond %{REQUEST_URI} ^/(.*).git(/.*)?$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
# main project page, like summary below
RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]

# summary
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]

# about
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]

# commit
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))id=([^&]*)(&.*)?$"
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]

# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1 [R=302,L,QSD]

# patch
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.patch [R=302,L,QSD]

# rawdiff, incomplete because can show only one file diff, which GitLab cannot
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.diff [R=302,L,QSD]

# log
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD$2 [R=302,L]

# atom
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L,QSD]

# refs, incomplete because two pages in GitLab, defaulting to "tags"
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags [R=302,L]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags/%1 [R=302,L,QSD]

# tree
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/%1$2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD$2 [R=302,L]

# /-/tree has no good default in GitLab, revert to HEAD which is a good
# approximation (we can't assume "master" here anymore)
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD [R=302,L]

# plain
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/%1$2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/HEAD$2 [R=302,L]

# blame: disabled
#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
#RewriteCond %{QUERY_STRING} h=([^&]*)
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/%1$2 [R=302,L,QSD]
# same default as tree above
#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/HEAD/$2 [R=302,L]

# stats
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/graphs/HEAD [R=302,L]

# still TODO:
# repolist: once migration is complete
#
# cannot be done:
# atom: needs a feed token, user must be logged in
# blob: no direct equivalent
# info: not working on main cgit website?
# ls_cache: not working, irrelevant?
# objects: undocumented?
# snapshot: pattern too hard to match on cgit's side

# special case, we keep a copy of the main index on the archive
RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]
# Fallback: everything else to GitLab
RewriteRule .* https://gitlab.torproject.org [R=302,L]

The reference copy of those is available in our (currently private)
Puppet git repository.

_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
tor-project Info Page

1 Like