Tor-Relay shows unusual traffic and connection issues

DbT · December 7, 2023, 12:23am

Hey there,

I’m new tor and have made my first experiences with operating a tor-nonexit-relay. However, I have noticed some things that made me a bit suspicious. WhenI start the tor relay, it’ll work quite normal for the first few hours. Traffic is normal and nothing special happens. When it runs for a while, it starts to behave a bit weird. The average upload rate is noticably higher than the average download rate - almost 10% more upload. I am not sure what to make with this ovservation. Is this normal? If my relay just “puts traffic through”, download and upload should - at least almost - be the same, right? I was worried if my relay might have been compromised and is used as a server/service.
The next problem occurs only after the relay has been running for a while. The server (vps, Linux, 1GiB RAM, 1 vCore) becomes very unsresponsive. Establishing a ssh connection or starting nyx takes much more time than it should. This is not directly related to traffic. When the relay hasn’t been running for long, performance is good. Additionally a lot of notifications like “nf_conntrack: table full, dropping packet” and “callbacks suppresse” are produced. Tor itself seems to still operate normally in this situation with traffic being normal and flags - like “fast” and “running” - remain unchanged.
What do you think about this? Have I been doing something wrong and what to do about that?

Best
D

Vort · December 7, 2023, 7:09am

There are two possibilities:

Either it is DDoS attack on your relay, like happens since yesterday for my relay.
Or your computer just don’t have enough resources to run relay.

In my case upload was 500% larger than download. That’s unusual. 10% difference is probably fine.

I suggest you to check how much RAM Tor process consumes. Maybe that’s a reason.

DbT · December 7, 2023, 2:06pm

Thanks for your response. Ram usage has been around 50%, even when performance was bad. Is there a RAM-Usage limit for the tor process or can it use as much as it needs? I was wondering if I can allocate more ram to the process
Are DoS attacks a common thing for tor relays? Is there something I can do about or should I just continue to run the relay?

Vort · December 7, 2023, 3:06pm

It’s complicated.
Theoretically, memory usage is controlled by MaxMemInQueues option in torrc.
It have default value, which depends on total amount of RAM available in system.
However, for my node, especially under DDoS, this option have almost no effect.
But since you said about 50% load, I think RAM amount is not a problem for your relay.

You can also check CPU load. Probably amount of used file descriptors also worth checking. But I’m just guessing. Maybe someone else have better ideas.

Also try pinging your relay when problem starts to appear.

This message may be important, looks like messages from Linux firewall, but I have no idea how it should work. Maybe you can google what it means and how it can be tuned.

Not very common, but also not rare.
Methods of mitigation depends on attack type.
Sometimes it is enough to ban IP addresses of attackers.
Sometimes relay just needs to be updated to latest version with fixes.
Sometimes adding limits like MaxMemInQueues mentioned earlier or NumCPUs or RelayBandwidthRate can help.

DbT · December 29, 2023, 8:23am

Now, after an update of tor, i was able to run the relay for almost a week without noticeable issues. After that the described problems occurred a few times, but its not a constant thing so i think that’s fine. But traffic is quite low, with an average of 1MB/s. However, Nyx was saying “[NOTICE] All current guards excluded by path restriction type 2; using an additonal guard.” I don’t know what to make with this.

sunshinecowboy · January 2, 2024, 9:18pm

Not all single core vps are created equal. Some can handle running a Tor relay, many can’t. This is the reason why your relay in struggling to maintain a 1 MB/s. I’d recommend finding a better vps provider and considering a 2 core option.

capole · January 7, 2024, 12:03am

I may be wrong, but the 1 vCPU and 1GB RAM sounds to me like the free tier of Oracle. I use the free tier as well and used to run my first relay for some time the same way you do and came across the exact same problem with the instance becoming unresponsive. I had a lot of service disruptions over time, it may work fine for a week or become unresponsive several times a day.

If you use the free tier you can limit you account, remove the payment method (just in case) and use ARM computing, they offer 3,000 OCPU hours and something else for free, which is the equivalent to 4 vCPU and 24GB RAM, all of this per month, so you’ll never run into any surprise if you limit yourself to those resources.

Thanks to migrating to ARM, I moved my first relay to this new machine and set up the other 2 without having to deal with any issue. I now host a machine with 2 vCPU and 8GB RAM with these 3 nodes, but I may have to add one more vCPU as with two nodes fully working and the other ramping up, CPU is already at 55-70%.