[tor-relays] Running a high-performance pluggable transports Tor bridge (FOCI 2023 short paper)

Linus Nordberg and I wrote a short paper that was presented at FOCI
2023. The topic is how to use all the available CPU capacity of a server
running a Tor relay.

This is how the Snowflake bridges are set up. It might also be useful
for anyone running a relay that is bottleneck on the CPU. If you have
ever run multiple relays on one IP address for better scaling (if you
are one of the relay operators affected by the recent
AuthDirMaxServersPerAddr change), you might want to experiment with this
setup. The difference is that all the instances of Tor have the same
relay fingerprint, so they operate like one big relay instead of many
small relays.

https://www.bamsoftware.com/papers/pt-bridge-hiperf/

···

The pluggable transports model in Tor separates the concerns of
anonymity and circumvention by running circumvention code in a
separate process, which exchanges information with the main Tor
process over local interprocess communication. This model leads to
problems with scaling, especially for transports, like meek and
Snowflake, whose blocking resistance does not rely on there being
numerous, independently administered bridges, but which rather forward
all traffic to one or a few centralized bridges. We identify what
bottlenecks arise as a bridge scales from 500 to 10,000 simultaneous
users, and then from 10,000 to 50,000, and show ways of overcoming
them, based on our experience running a Snowflake bridge. The key idea
is running multiple Tor processes in parallel on the bridge host, with
externally synchronized identity keys.

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

3 Likes

The workshop presentation video (22 minutes) of this paper has just
become available on YouTube. The paper homepage has a copy of the video
too.

The other FOCI 2023 issue 2 videos are online as well:

···

On Mon, Sep 04, 2023 at 02:09:50AM -0600, David Fifield wrote:

Linus Nordberg and I wrote a short paper that was presented at FOCI
2023. The topic is how to use all the available CPU capacity of a server
running a Tor relay.

This is how the Snowflake bridges are set up. It might also be useful
for anyone running a relay that is bottleneck on the CPU. If you have
ever run multiple relays on one IP address for better scaling (if you
are one of the relay operators affected by the recent
AuthDirMaxServersPerAddr change), you might want to experiment with this
setup. The difference is that all the instances of Tor have the same
relay fingerprint, so they operate like one big relay instead of many
small relays.

Running a high-performance pluggable transports Tor bridge

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

The key is that every instance of tor must have a different nickname.
That way, even though they all have the same relay identity key, Tor
Metrics knows to count all the descriptors separately.

So, for instance, on one snowflake bridge (identity
2B280B23E1107BB62ABFC40DDCC8824814F80A72), we use nicknames:
  flakey1, flakey2, …, flakey12
and on another bridge (identity 8838024498816A039FCBBAB14E6F40A0843051FA)
we use nicknames:
  crusty1, crusty2, …, crusty12

Instructions for setting up nicknames can be found at

It used to be the case that Tor Metrics did not understand the
descriptors of this kind of multi-instance bridge. If you had N
instances, it would count only 1 of them per time period. But Tor
Metrics has now known about this kind of bridge (multiple descriptors
per time period with the same identity key but different nicknames) for
more than a year:

Relay Search still does not know about multi-instance bridges, though.
If you look up such a bridge, it will display one of the multiple
instances more or less at random. In the case of the current snowflake
bridges, you have to multiply the numbers on Relay Search pages by 12 to
get the right numbers.
https://metrics.torproject.org/rs.html#details/2B280B23E1107BB62ABFC40DDCC8824814F80A72
https://metrics.torproject.org/rs.html#details/8838024498816A039FCBBAB14E6F40A0843051FA

There's a special repository for making graphs of snowflake users. This
was necessary in the time before Tor Metrics natively understood
multi-instance bridges, and I still use it because it offers some extra
flexibility over what metrics.torproject.org provides. With some small
changes, the same code could work for other pluggable transports, or
even single bridges.

This is a sample of the graph output:

···

On Mon, Dec 11, 2023 at 08:13:17PM +0100, Felix wrote:

Thank you for the paper and the presentation.

Chapter 3 (Multiple Tor processes) shows the structure:

> mypt - HAproxy = multiple tor services

At the end of chapter 3.1 it is written
> the loss of country- and transport-specific metrics

How will the metrics data be pulled out of the multiple tor services to
fetch *all* metrics data? Or will only one of them be looked at, without
full data representation?

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays