[tor-project] minutes from the sysadmin meeting


TPA held its last meeting of the year, and it was a big one because this
time we welcomed the UX and community folks to talk about web things.

# Roll call: who's there and emergencies

* anarcat
* gaba
* gus
* kez
* lavamind
* nah

# Final roadmap review before holidays

What are we *actually* going to do by the end of the year?

See the 2021 roadmap, which we'll technically be closing this month:

Here are the updates:

* blog migration done!
* discourse instance now in production!
* jenkins (almost) fully retired (just needs to pull rouyi and the
   last builder off, waiting for the Debian package tests)
* tpa mailing list *will* be created
* submission server ready, waiting for documentation for launch
* donate website rewrite postponed to after the year-end campaign
* bridges.torproject.org not necesssarily deployed before the
   holidays, but a priority

## Website redesign retrospective

Gus gave us a quick retrospective on the major changes that happened
on the websites in the past few years.

The website migration started in 2018, based on a new design made by
Antonela. In Tor Dev Meeting Rome, we discussed how to do the
migration. The team was antonela (design), hiro (webdev), alison and
gus (content), steph (comms), pili (pm), and emmapeel (l10n).

The main webpage totally redesigned, and support.tpo created as a new
portal. Some docs from Trac and RT articles imported in support.tpo.

Lektor was chosen because:

- localisation support
- static site generator
- written in Python
- can provide a web interface for editors

But dev.tpo was never launched. We have a spreadsheet (started with
duncan at an All Hands meeting in early 2021) with content that still
needs to be migrated. We didn't have enough people to do this so we
prioritized the blog migration instead.

### Where we are now

We're using lektor mostly everywhere, except metrics, research, and

* metrics and research portal was separate, developed in hugo. irl
   made a bootstrap template following the styleguide
* status was built by anarcat using hugo because there was a solid
   "status site" template that matched

A lot of content was copied to the support and community portals, but
some docs are only available in the old site (2019.www.tpo). We
discussed creating a docs.tpo for documentation that doesn't need to
be localized and not for end-users, more for advanced users and

So what do we do with docs.tpo and dev.tpo next? dev.tpo just needs to
happen. It was part of sponsor9, and was never completed. docs.tpo was
for technical documentation. dev.tpo was a presentation of the
project. dev.tpo is like a community portal for devs, not
localized. It seems docs.tpo could be part of dev.tpo, as the
distinction is not very clear.

## web OKR 2022 brainstorm

To move forward, we did a quick brainstorm of a roadmap for the web
side of TPA for 2022. Here are the ideas that came out:

* check if bootstrap needs an upgrade for all websites
* donation page launch
* sponsor 9 stuff: collected UX feedback for portals, which involves
   web to fix issues we found, need to prioritise
* new bridge website (sponsor 30)
* dev portal, just do it (see [issue 6])

[issue 6]: get first iteraion of design and code for dev.torproject.org (#6) · Issues · The Tor Project / Web / dev · GitLab

We'll do another meeting in jan to make better OKRs for this.

We also need to organise with the new people:

* onion SRE: new OTF project USAGM, starting in february
* new community person

The web roadmap should live somewhere under the [web wiki] and be
cross-referenced from the [TPA roadmap section].

[web wiki]: Home · Wiki · The Tor Project / Web / Web Team · GitLab
[TPA roadmap section]: 2022 · Wiki · The Tor Project / TPA / TPA team · GitLab

## Systems side

We didn't have time to review the TPA dashboards, and have delegated
this to the next weekly checkin, on December 13th.

* Development · Boards · The Tor Project / TPA / TPA team · GitLab
* Development · Boards · TPA · GitLab

# Holidays

Who's AFK when?

* normal TPI: dec 22 - jan 5 (incl.)
* anarcat: dec 22 - jan 10th, will try to keep a computer around and
   not work, which is hard
* kez: normal TPI, will be near a computer, checking on things from
   time to time
* lavamind: normal TPI (working on monday or tuesday 20/21, friday
   7th), will be near a computer, checking on things from time to time

TPA folks can ping each other on signal if you see something and need
help or take care of it.

Let's keep doing the triage rotation, which means the following weeks:

* week 50 (dec 5-11): lavamind
* week 51 (dec 12-18): anarcat
* week 52 (dec 19-25): kez
* week 1 2022 (dec 26 - jan 1 2022): anarcat
* week 2 (jan 2-7 2022): lavamind
* week 3 (jan 8-14 2022): kez

anarcat and lavamind swapped the two last weeks, normal schedule
(anarcat/kez/lavamind) should resume after.

The idea is *not* to work as much as we currently do, but only check
for emergencies or "code red". As a reminder, this policy is defined
in [TPA-RFC-2], [support levels]. The "code red" example does not
currently include GitLab CI, but considering the rise in that service
and the pressure on the shadow simulations, we may treat major outages
on runners as a code red during the vactions.

[TPA-RFC-2]: tpa rfc 2 support · Wiki · The Tor Project / TPA / TPA team · GitLab
[support levels]: tpa rfc 2 support · Wiki · The Tor Project / TPA / TPA team · GitLab

# Other discussions

We need to review the dashboards during the next checkin.

We need to schedule a OKR session for the web team in January.

# Next meeting

No meeting was scheduled for next month. Normally, it would fall on
January 3rd 2022, but considering we'll be on vacation during that
time, we should probably just schedule the next meeting on January

# Metrics of the month

* hosts in Puppet: 88, LDAP: 88, Prometheus exporters: 139
* number of Apache servers monitored: 27, hits per second: 176
* number of Nginx servers: 2, hits per second: 0, hit ratio: 0.81
* number of self-hosted nameservers: 6, mail servers: 8
* pending upgrades: 0, reboots: 0
* average load: 1.68, memory available: 3.97 TiB/4.88 TiB, running processes: 694
* disk free/total: 84.64 TiB/35.46 TiB
* bytes sent: 340.91 MB/s, received: 202.82 MB/s
* [GitLab tickets]: 164 tickets including...
   * open: 0
   * icebox: 142
   * backlog: 10
   * next: 8
   * doing: 2
   * (closed: 2540)

[Gitlab tickets]: Development · Boards · The Tor Project / TPA / TPA team · GitLab

We're already progressing towards our Debian bullseye upgrades: 11 out
of those 88 machines have been upgraded. We did retire a few buster
boxes however, which helped: we had a peak of 91 machines, in October
*and* early December, which implies we have quite a bit of churn in
the number of machines created and destroyed, which is interesting in
its own right.

We do not have a completion date yet, but considering that (a) the first
bullseye hosts were introduced in September and (b) that we have ~12.5%
of the hosts upgraded, it will take us another 21 months (or 7 quarters,
more than two years!) to complete the upgrade. Obviously, a few work
sessions will be required to meet our planned deadline (next summer).


Antoine Beaupré
torproject.org system administration