[tor-project] TPA-RFC-58: Podman CI runner deployment, help needed

Summary: I deployed a new GitLab CI runner backed by Podman instead of
Docker, we hope it will improve the stability and our capacity at
building images, but I need help testing it.

Background

We’ve been having stability issues with the Docker runners for a
while now. We also started looking again at container image builds,
which are currently failing without Kaniko.

Proposal

Testers needed

I need help testing the new runner. Right now it’s marked as not
running “untagged jobs”, so it’s unlikely to pick your CI jobs and run
them. It would be great if people could test the new runner.

See the GitLab tag documentation for how to add tags to your
configuration. It’s basically done by adding a tags field to the
.gitlab-ci.yml file.

Note that in TPA’s ci-test gitlab-ci.yaml file, we use a
TPA_TAG_VALUE variable to be able to pass arbitrary tags down into
the jobs without having to constantly change the .yaml file, which
might be a useful addition to your workflow.

The tag to use is podman.

You can send any job you want to the podman runner, but we’d like to
test a broad variety of things before we put it in production, but
especially image buildings. Upstream even has a set of instructions
to build packages inside podman
.

Long term plan

If this goes well, we’d like to converge towards using podman for
all workloads. It’s better packaged in Debian, and better designed,
than Docker. It also allows us to run containers as non-root.

That, however, is not part of this proposal. We’re already running
Podman for another service (MinIO) but we’re not proposing to
convert all existing services to podman. If things work well
enough for a long enough period (say 30 days), we might turn off the
older Docker running instead.

Alternatives considered

To fix the stability issues in Docker, it might be possible to upgrade
to the latest upstream package and abandon the packages from
Debian.org. We’re hoping that will not be necessary thanks to Podman.

To build images, we could create a “privileged” runner. For now, we’re
hoping Podman will make building container images easier. If we do
create a privileged runner, it needs to take into account the long
term tiered runner approach.

Deadline

The service is already available, and will be running untagged jobs in
two weeks unless an objection is raised.

Status

This proposal is currently in the proposed state.

References

Feedback can be provided in the discussion issue.

[...]

## Testers needed

I need help testing the new runner. Right now it's marked as not
running "untagged jobs", so it's unlikely to pick your CI jobs and run
them. It would be great if people could test the new runner.

See the [GitLab tag documentation] for how to add tags to your
configuration. It's basically done by adding a `tags` field to the
`.gitlab-ci.yml` file.

Note that in TPA's [ci-test gitlab-ci.yaml file], we use a
`TPA_TAG_VALUE` variable to be able to pass arbitrary tags down into
the jobs without having to constantly change the .yaml file, which
might be a useful addition to your workflow.

The tag to use is `podman`.

You can send any job you want to the `podman` runner, but we'd like to
test a broad variety of things before we put it in production, but
especially image buildings. Upstream even has a [set of instructions
to build packages inside podman].

[ci-test gitlab-ci.yaml file]: .gitlab-ci.yml · main · The Tor Project / TPA / CI tests · GitLab
[GitLab tag documentation]: `.gitlab-ci.yml` keyword reference | GitLab
[set of instructions to build packages inside podman]: Docker executor | GitLab

Update on this: I added the `tpa` tag earlier this week so that the
runner would pick up our nightly test jobs. Today I've also added the
`amd64` tag to unblock a test pipeline nickm gracefully sent our
way.

I'm happy to announce that both tests are doing well and we're on track
to enabling the runner to run all jobs normally this coming Wednesday.

Also note that I did an extensive amount of work on the GitLab CI
dashboard, which now also features queue wait times:

https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab-ci-overview

user: tor-guest, no password, as usual.

That should allow you to answer the question of "is it just me or CI is
taking forever to pick up my job". For the last two days we had those
stats, all jobs get picked up within one minute of being queued.

Feedback is, as usual, welcome, either here or:

Thank you for your attention!

a.

···

On 2023-08-16 13:32:17, Antoine Beaupré wrote:

--
Antoine Beaupré
torproject.org system administration
_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

Reminder: if there are no objections, the podman runner will be online
for all jobs *tomorrow*.

As an aside, GitLab will soon run it's 100th *thousand* pipeline, so I
guess it's a good way to celebrate, with a fresh new runner! :slight_smile:

a.

···

On 2023-08-16 13:32:17, Antoine Beaupré wrote:

Summary: I deployed a new GitLab CI runner backed by Podman instead of
Docker, we hope it will improve the stability and our capacity at
building images, but I need help testing it.

--
Antoine Beaupré
torproject.org system administration
_______________________________________________
tor-project mailing list
tor-project@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

The podman runner (with the cute name of `ci-runner-x86-02`) is now live
and accepts "untagged" jobs or jobs tagged with one of: kvm, linux,
debug-terminal, x86_64, x86-64, 16 CPU, 94.30 GiB, amd64, podman, tpa.

Please do notify us if any (unusual) problem occurs. We'll also be
monitoring this through the Grafana dashboard:

https://grafana.torproject.org/d/fd0b2fb2-88d0-4f85-bc86-16164c083b51/gitlab-ci-overview

user: tor-guest, no password

Also, according to that dashboard, we're grossly over capacity now,
which means either two things:

1. we need to retire a runner
2. YOU need to RUN MORE CI! :slight_smile:

I tend towards the latter...

Also remember that you can bring your own runners, any computer can be
repurposed into a GitLab runner to pick up your more exotic jobs in all
ways imaginable. See our instructions for that at:

Have a good day and thanks for flying TPA!

A.

···

On 2023-08-29 10:35:42, Antoine Beaupré wrote:

On 2023-08-16 13:32:17, Antoine Beaupré wrote:

Summary: I deployed a new GitLab CI runner backed by Podman instead of
Docker, we hope it will improve the stability and our capacity at
building images, but I need help testing it.

Reminder: if there are no objections, the podman runner will be online
for all jobs *tomorrow*.

As an aside, GitLab will soon run it's 100th *thousand* pipeline, so I
guess it's a good way to celebrate, with a fresh new runner! :slight_smile:

--
Antoine Beaupré
torproject.org system administration