New kind of attack?

Enkidu-6 · January 15, 2024, 8:01pm

There’s something going on for a while and I haven’t seen any mentions of it. Wondering if anyone else is experiencing it.

The problem is the huge unreasonable spikes of outgoing packets which cause RAM to max out and eventually causes Tor to crash. The interesting part is that even when you shut down tor and restart it after a few minutes, it start right from where it left off and in about a minute, you’re back where you were. See below:

This one shows the RAM usage as well

The gaps are when I shut down Tor and as you can see, the spike happens immediately after the restart even when I start Tor 5 to 7 minutes later. Shouldn’t there be a period when tor establishes new circuits when it restarts? Why does it start where it was left off and continues sending data from previously established connections? Does this mean that this is an attack directly pointed at my specific relay and IP address?

I’m assuming all this traffic is going to one or more exit relay.

Sample log:

Excellent. Publishing server descriptor.
Jan 11 14:00:04.000 [notice] Bootstrapped 100% (done): Done
Jan 11 14:02:22.000 [notice] We're low on memory (cell queues total alloc: 4142541744 buffer total alloc: 304637952, tor compress total alloc: 43280 (zlib: 43264, zstd: 0, lzma: 0), rendezvous cache total alloc: 3465829). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
Jan 11 14:02:28.000 [notice] Removed 448201792 bytes by killing 23077 circuits; 266323 circuits remain alive. Also killed 0 non-linked directory connections. Killed 1 edge connections
Jan 11 14:02:28.000 [warn] connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at src/core/or/circuitlist.c:2747) hasn't sent end yet? (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] tor_bug_occurred_(): Bug: src/core/or/connection_edge.c:1086: connection_edge_about_to_close: This line should not have been reached. (Future instances of this warning will be silenced.) (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug: Tor 0.4.8.10: Line unexpectedly reached at connection_edge_about_to_close at src/core/or/connection_edge.c:1086. Stack trace: (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(log_backtrace_impl+0x5b) [0x55f92817a82b] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0x18a) [0x55f928191d7a] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(connection_about_to_close_connection+0x6c) [0x55f92823711c] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(+0x6cb3e) [0x55f9280f9b3e] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(+0x6cee8) [0x55f9280f9ee8] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /lib64/libevent-2.1.so.7(+0x24958) [0x7f2b6ad1d958] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /lib64/libevent-2.1.so.7(event_base_loop+0x577) [0x7f2b6ad1f2a7] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0x127) [0x55f9280fdb17] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x205) [0x55f928101b35] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(tor_main+0x4d) [0x55f928101f5d] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(main+0x1d) [0x55f9280f4cad] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /lib64/libc.so.6(+0x3feb0) [0x7f2b6a43feb0] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /lib64/libc.so.6(__libc_start_main+0x80) [0x7f2b6a43ff60] (on Tor 0.4.8.10 )
Jan 11 14:02:28.000 [warn] Bug:     /usr/bin/tor(_start+0x25) [0x55f9280f4d05] (on Tor 0.4.8.10 )
Jan 11 14:02:34.000 [notice] Performing bandwidth self-test...done.
Jan 11 14:02:58.000 [notice] We're low on memory (cell queues total alloc: 4073659392 buffer total alloc: 369604608, tor compress total alloc: 0 (zlib: 0, zstd: 0, lzma: 0), rendezvous cache total alloc: 3978040). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
Jan 11 14:03:03.000 [notice] Removed 444758160 bytes by killing 23863 circuits; 275706 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
Jan 11 14:03:47.000 [notice] We're low on memory (cell queues total alloc: 4118419008 buffer total alloc: 324532224, tor compress total alloc: 0 (zlib: 0, zstd: 0, lzma: 0), rendezvous cache total alloc: 4539033). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
Jan 11 14:03:50.000 [notice] Removed 445017936 bytes by killing 22438 circuits; 287151 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections

Thank you.

cozybeardev · January 15, 2024, 8:12pm

I’ve noticed this exact behaviour for the first time last week on one of my nodes. The downtime was very minimal for me though, but can imagine stability is bad for users. Hadn’t really looked into it thus far as it isn’t really an issue for now. Detected it by basic metrics, as you can imagine.

I’ll see later this week if I can see some useful info in the logs. Let me know if I can help.

Enkidu-6 · January 16, 2024, 8:50am

@cozybeardev

Well, the downtime may be a lot longer than you imagine. I have set up remote monitoring for my relays and I get a message when the port is unresponsive. In my experience, even though Tor is running, it won’t accept new connections. In my case Tor is unresponsive sometimes for 10 minutes and then it accepts new connections and of course within minutes it’s back to being unresponsive. This keeps on going sometimes for an hour or more.

In other words, you can see traffic but most of it is not new connections. It’s simply busy processing the existing connections initiated by the attack.

Vort · January 16, 2024, 8:51am

gus · January 17, 2024, 7:54pm

Linking the tor-relays mailing list thread to this topic too:

https://lists.torproject.org/pipermail/tor-relays/2024-January/021460.html

danjurgens · January 20, 2024, 11:57am

I had the same issue on one of my relays. I saw hundreds of short lived connections too. I added these iptables rules and it seems to have helped, but I’m now getting much less throughput through my relay.

Enkidu-6 · January 21, 2024, 1:53am

The throughput you were previously experiencing included a lot of garbage sent and received due to the attack so it’s reasonable to have less usage once the attackers are blocked.

Once the authorities realize you’re doing less than you’re capable of, you’ll start building new circuits and new users will choose your relays and the throughput will eventually get back up. Only this time, you’ll be processing mostly legitimate connections.

Vort · January 22, 2024, 10:03am

I remember seing traffic from attack going via connections with other relays.
In such case attacker is anonymous and can’t be blocked externally.

Did anyone saw this attack going from specific attacker addresses, which then can be banned with firewall?

cozybeardev · January 22, 2024, 10:55am

The frequency of attacks is increasing for me. Last two days two different nodes were hit at around the same time of 6:00 am utc (one on each day).

danjurgens · January 25, 2024, 7:06am

It happened to one of my relays again with, even with the iptables setup, so I guess that doesn’t prevent it.

capole · January 26, 2024, 9:22pm

I wonder if this has something to do with what I just experienced.

I restarted the machine which hosts a few of my instances so I could solve some VPN connectivity issues. After bootstrapping, you can see how CPU and RAM usage increases (as new circuits are created) and after roughly 2 hours there’s a big drop for no reason, no crashes, no warnings, nothing in logs. Info in Nyx is fine. Now the usage is at 40-55%, not usual, as it used to be at 65-80%.

Despite the CPU usage drop, RAM is still consumed, no drop at all.

Enkidu-6 · January 26, 2024, 9:22pm

The iptables setup doesn’t completely prevent it but it greatly reduces the impact. We’re preventing them from creating multiple connections but they can still pack a punch with the one or two connections they’re allowed.

Remember, some of those packets are coming from other relays and we don’t want to completely ban a lot of other relays in the network.

Enkidu-6 · January 26, 2024, 9:22pm

Well, the iptables script puts a bunch of IP addresses in the block list based on their behavior, mainly due to their attempts to make concurrent connections. Sometimes your relay is the point of entry and sometimes you’re just processing the packets that are coming from other relays under the attack.

There’s so much you can do with iptables. The fact that Tor is getting overloaded for processing the packets tells me that the best way to block this attack would be at the application layer. In other words, Tor should recognize the bogus requests and simply not process them.

Vort · January 27, 2024, 3:45pm

This is how memory leaks (or related “heap fragmentation” problem) behave.
I suggest you to restart relay once again.

cozybeardev · January 31, 2024, 9:35pm

Issues are getting worse for me. Almost once per day a relay goes down. Considering creating scripting to restart my relays preventively to reduce downtime. Anyone else seeing the same thing?

Vort · February 1, 2024, 11:19am

For me attack happen several times a month.

I made such thing already, for Windows:

using System;
using System.Configuration.Install;
using System.ServiceProcess;
using System.Threading;
using System.IO;
using System.ComponentModel;
using System.Reflection;
using System.Diagnostics;

namespace TorRestart
{
    [RunInstaller(true)]
    public class MyWindowsServiceInstaller : Installer
    {
        public MyWindowsServiceInstaller()
        {
            var processInstaller = new ServiceProcessInstaller();
            var serviceInstaller = new ServiceInstaller();

            processInstaller.Account = ServiceAccount.LocalSystem;

            serviceInstaller.DisplayName = "TorRestart";
            serviceInstaller.StartType = ServiceStartMode.Automatic;

            serviceInstaller.ServiceName = "TorRestart";
            this.Installers.Add(processInstaller);
            this.Installers.Add(serviceInstaller);
        }
    }



    class Program : ServiceBase
    {
        Thread thread;
        ManualResetEvent shutdownEvent;
        static string exeLocation;
        static string logLocation;

        private void Worker()
        {
            var pc = new PerformanceCounter("Process", "Private Bytes", "tor");

            for (;;)
            {
                try
                {
                    int torMB = (int)(pc.NextValue() / 1024.0 / 1024.0);
                    Log($"Tor RAM: {torMB} MB");

                    if (torMB > 2560)
                    {
                        Log($"Stopping Tor...");
                        ServiceController service = new ServiceController("tor");
                        service.Stop();
                        while (Process.GetProcessesByName("tor").Length != 0)
                            Thread.Sleep(500);
                        Log($"Starting Tor...");
                        service.Start();
                    }
                }
                catch
                {
                }

                if (shutdownEvent.WaitOne(10000))
                    break;
            }
        }

        protected override void OnStart(string[] args)
        {
            Log("Service is starting");
            base.OnStart(args);
            shutdownEvent = new ManualResetEvent(false);
            thread = new Thread(Worker);
            thread.Start();
        }

        protected override void OnStop()
        {
            Log("Service is stopping");
            base.OnStop();
            shutdownEvent.Set();
            thread.Join();
        }

        Program()
        {
            ServiceName = "TorRestart";
        }

        static void Log(string message)
        {
            File.AppendAllText(logLocation,
                $"[{DateTime.Now:dd.MM.yyyy HH:mm:ss}] {message}\r\n");
        }

        static void CurrentDomainUnhandledException(object sender,
            UnhandledExceptionEventArgs e)
        {
            Log($"{e.ExceptionObject}\r\n");
        }

        static void Main(string[] args)
        {
            exeLocation = Assembly.GetExecutingAssembly().Location;
            logLocation = Path.Combine(Path.GetDirectoryName(exeLocation), "log.txt");

            AppDomain.CurrentDomain.UnhandledException +=
                CurrentDomainUnhandledException;

            if (Environment.UserInteractive)
            {
                string parameter = string.Concat(args);
                switch (parameter)
                {
                    case "--install":
                        ManagedInstallerClass.InstallHelper(new string[] { exeLocation });
                        break;
                    case "--uninstall":
                        ManagedInstallerClass.InstallHelper(new string[] { "/u", exeLocation });
                        break;
                }
            }
            else
            {
                Run(new Program());
            }
        }
    }
}

danjurgens · February 3, 2024, 1:34pm

I find restarting the service, or even the entire VM doesn’t help. It goes right back into the bad state. I let it ride for a while and it recovered on its own after several OOM kills.

gus · February 3, 2024, 1:37pm

Have you used compare.sh script to remove the relays that were blocked?

cozybeardev · February 6, 2024, 8:25am

You can see it very well on some of my relay graphs lol.

https://metrics.torproject.org/rs.html#details/014326416058DCFD0965167026CBEF647409A000

https://metrics.torproject.org/rs.html#details/C7A51E46740C15DEC0535AF5560A1919CE6E5758

https://metrics.torproject.org/rs.html#details/2A134CF4E3CC5C7F77F331177791843794B96068

Is there any advise from official Tor devs? Perhaps some config we can change? I’m seeing persistence as well even after reboot. The impact on my relay health is serious now at this point, I have to intervene daily and I have 3 relays now with descriptor errors - which has never happend before. If there is any info I can collect for debugging, let me know.

cozybeardev · February 7, 2024, 8:56pm

Haha, this is too funny.

I think the issue is occuring for a long while now on one of my bandwidth restricted nodes, hence not resulting in any real issues.

The graph looks hilarious.

Relay Search (check the 6m time period, written bytes just steadily increasing since early this year).

Perhaps there is a solution in there → restricting bandwidth. Not really eager to do that though.