Excessive / Unbalanced Relay Traffic

Hi all,

I’ve been running Tor middle relays for well over a decade now so I like to consider myself pretty experienced with the trials and tribulations of being a relay operator. With that said, I’m seeing some unusual traffic patterns of late which are new to me and I’m wondering if anyone could shed any light on what may be the cause.

My relay maurice (62BAB7516A0D3F1E6B2BC94767592791F6B58FB0), a virtual machine running on Proxmox VE, recently had an unscheduled restart due to a hardware failure on the host machine and was offline for about 36 hours. I remediated the hardware issue and uploaded Tor to the latest version at the same time (0.4.8.5) as bringing the server back online. The guard flag was lost due to the outage but quickly regained upon restarting the server and everything else, as best as I can tell, has remained the same, with the exception of the traffic usage.

Normally the server would upload slightly more than it downloaded, totalling a few GB over the course of a day, but I am now finding that for extended periods of time the upload is fully utilised at the bandwidth cap I’ve set (12.5MB/s, peak to 25MB/s), whereas the download is way below, running at around 30% of the cap. Initially this happened right after restarting the server and I thought it may have been due to losing the guard flag (although on second thoughts that should make the traffic more evenly balanced, rather than less) but it has continued for most of the last 2 weeks, with only a 3-day window between 10/9 and 13/9 where traffic seemed to be running normally. The graphs available on the Tor Project Metrics page illustrate this imbalance with the lines showing bytes read and bytes written being significantly apart.

Can anyone explain this anomaly? Is this traffic ‘normal’ and can be safely ignored or should I undertake some further investigation? Are any other relay operators seeing anything similar?

Some details on the system:

OS: Debian 11 Bullseye - Linux 5.10.0-25-amd64 #1 SMP Debian 5.10.191-1 (2023-08-16) x86_64 GNU/Linux
Tor: 0.4.8.5 - Libevent 2.1.12-stable, OpenSSL 1.1.1n, Zlib 1.2.11, Liblzma 5.2.5, Libzstd 1.4.8 and Glibc 2.31 as libc, compiled with GCC version 10.2.1
HW: Proxmox VM with 4 VCPU, 8GB RAM, 24GB RAID10 ZFS SSD boot device.

Thanks in advance,
M

1 Like

No, I have no explanation for the behavior, but since the traffic was briefly normal between September 9th to 13th, your Guard relay may be attempting to balance the traffic on its own. It is your call whether or not you want to investigate this further, but I would personally monitor the status every few hours and see if the situation corrects itself. In contrast, my Tor guard relay has no recent history of abnormal traffic.

I’m not sure what you mean by this. I’ve never seen this behaviour before with the upload and download rates being so out of sync with each other, and I can’t see how this would be related to being an entry guard or not. As best as I understand it, it’s approximately ‘one packet in, one packet out’ and vice-versa, with a bit of overhead for things like rendez-vous points, communicating with authorities, etc. I initially thought that it may be due to the relay being chosen as the RVP for a busy hidden service but I don’t think that’s an explanation for such a discrepancy in the traffic rates.

Maybe your relay is currently under DDoS Enkidu-6 and @toralf have put iptables rules on github.
In general I can recommend vnstat and nload for monitor network traffic and track real-time bandwidth usage.

Matt via Tor Project Forum:

Hi all,

I’ve been running Tor middle relays for well over a decade now so I like to consider myself pretty experienced with the trials and tribulations of being a relay operator. With that said, I’m seeing some unusual traffic patterns of late which are new to me and I’m wondering if anyone could shed any light on what may be the cause.

My relay maurice (62BAB7516A0D3F1E6B2BC94767592791F6B58FB0), a virtual machine running on Proxmox VE, recently had an unscheduled restart due to a hardware failure on the host machine and was offline for about 36 hours. I remediated the hardware issue and uploaded Tor to the latest version at the same time (0.4.8.5) as bringing the server back online. The guard flag was lost due to the outage but quickly regained upon restarting the server and everything else, as best as I can tell, has remained the same, with the exception of the traffic usage.

Normally the server would upload slightly more than it downloaded, totalling a few GB over the course of a day, but I am now finding that for extended periods of time the upload is fully utilised at the bandwidth cap I’ve set (12.5MB/s, peak to 25MB/s), whereas the download is way below, running at around 30% of the cap. Initially this happened right after restarting the server and I thought it may have been due to losing the guard flag (although on second thoughts that should make the traffic more evenly balanced, rather than less) but it has continued for most of the last 2 weeks, with only a 3-day window between 10/9 and 13/9 where traffic seemed to be running normally. The graphs available on the Tor Project Metrics page illustrate this imbalance with the lines showing bytes read and bytes written being significantly apart.

Can anyone explain this anomaly? Is this traffic ‘normal’ and can be safely ignored or should I undertake some further investigation? Are any other relay operators seeing anything similar?

I think it could match quite well to the increase in written dir bytes
on the directory mirrors side:

https://metrics.torproject.org/dirbytes.html?start=2023-08-31&end=2023-09-19

Now, who or what is increasing those written bytes is not really clear
at this point.

Georg

I see this too on one of my relays.
The load comes in waves of varying lengths and was temporarily booting off the relay from the consensus because it used the whole available upload and causing 100% CPU load.

The IP addresses that i tracked down are IPv4 and IPv6 and all seem to belong to “Akamai Connected Cloud” but i don’t know how many of them are involved.

I guess waiting it out and hope it will stop soon?

Thanks for the suggestion, I’ll have a look at the repos. I’ve got monitoring from Zabbix and the Proxmox hypervisor available, and use nyx on the CLI to monitor the traffic on the VM itself so likely don’t need vnstat/nload but I’ll bear it in mind.

The level of amplification that seems to be getting is quite worrying, it’s looking somewhere between 3x and 4x here at a quick glance at the latest ‘wave’.

I’ve had this issue, about 4 or 5 distinct waves.

Not had this one (yet). I’ll dig into my monitoring later this week when I get chance and post some graphs to see if anyone else is seeing the same (and also because graphs r kool).

M

The fascinating thing is that it started at the same time and only ever affects the same relays.
Of all my servers, only 2 are affected. Each one has 40 exit or guard instances running.

Of the 40 guards, only one instance is affected:
E1AF5373E3240566B598FA481AD3860549F6168B

Of the 40 exits, only 4 instances are affected:
BEE071E521A47C740C9F6184FEBCF78BFF5F1275
24FDF4754BB3775A6D54E078DCBBCA43D7B1B07E
4A169C0A14E41F647D009EC49D28A3D11629DAF0
A398080A6A72F828DC4476DE45E28C5892CA1070

So, as promised, some graphs:

Several incidents are evident, beginning on 5th September. The sharp drop late on the 20th was when I reduced the server’s RelayBandwidthRate as my upload usage was becoming a bit of a concern.

CPU usage also appears to correlate, although I’d expect this as handling more traffic has to result in using more CPU usage.

The software was upgraded from 0.4.8.5 to 0.4.8.6 around 23:00 BST on 19th September. As shown on the graphs, the increase in upload speed persisted. At the time of writing traffic appears to have returned to normal.

M

1 Like

Since my last post, things had generally been quiet, but similar traffic patterns have resumed over the last few days.

For information, I upgraded to 0.4.8.7 yesterday.

Same here for me but they haven’t crashed the Tor process yet.

Did someone found out what exactly they are doing or who they are?

It looks as if the last ‘attack’ on my relay finished on 23rd October and traffic appears to have returned to normal.

I still have no idea who or what caused the traffic anomaly.

1 Like