by gk | April 25, 2022
Running relays is a significant contribution to our project and we've designed that process so that the barrier of entry is low, making it possible for a variety of people with different backgrounds to participate. This openness is important as it makes our network (and the privacy guarantees it offers) more robust and resilient to attacks. However, that low threshold of contributing to our network also makes it easier for malicious operators to attack our users, e.g. via Man-in-the-Middle (MitM) attacks at exit nodes.
This blog post explains what we're doing to detect malicious actors (and remove their relays), how we developed these strategies, and what we're working on to make it harder for bad operators to run attacks. Additionally, we want to shine some light on this part of our day-to-day work at Tor. Because this is an arms race, we have to balance being transparent with effective detection of malicious actors. In this post we hope to offer more transparency about our approach without compromising the methods we use to keep our users safe.
What does bad-relay work look like?
Whether a relay is "bad" or "malicious" is often not as clear-cut as it might sound at first. Maybe the relay in question is just misconfigured and is, e.g., missing family settings (for the family configuration option see section 5 in our post-install instructions). Does that mean the operator has nefarious intentions? To help us react to those situations, we have developed a set of criteria and a process for dealing with potentially bad relays. Now, when we get a report or detect suspicious behavior, we follow that process. First, we try to fix the underlying issue together with the operator, and if that does not work or the behavior is outright malicious, we propose the relay for rejection (or maybe assigning the
There is an important point here to be made about the process of removing relays from the network: while we do have staff who are watching the network and reviewing reports from volunteers, it is not Tor staff who reject relays (and that is by design). Instead, rejecting relays from the network is the job of the directory authorities. Directory authorities are special-purpose relays that maintain a list of currently-running relays and periodically publish a consensus together with the other directory authorities. Directory authorities are run mostly by trusted volunteers from our community. They rely on recommendations from the Network Health team to evaluate and address malicious relays, but only they only reject a potentially malicious relay if the majority of directory authorities agree to do so.
KAX17 and failures in the past
In December 2020, we noticed a weird pattern in our relay graphs. This pattern showed that dozens of relays would join the network on 00:00:00 at the first of a month just to vanish slowly again. This had been ongoing for months before we saw that pattern, and when we noticed it, we speculated that we were witnessing a bug in Tor's hibernation code.
Then in early September 2021, based on our own monitoring and investigation, we saw some unusual relay additions to our network that caught our attention: relays that seemed to belong to a single operator were showing up, but they did not have a family declaration. Moreover, there was no way to contact the operator as the contact information field was empty. Based on the set of criteria for dealing with potentially bad relays we had established, we proposed that these relays be rejected from the network, and the directory authorities approved this rejection. As soon as these relays were rejected, though, similar relays kept showing up on the network which then in turn got rejected as well.
At the end of October 2021, we got a tip from an anonymous Tor user who helped us to eventually detect and remove a relay group that was later dubbed "KAX17". We don't know who was running those relays or what they were doing or trying to achieve (their behavior was quite different from the more well-known attack where malicious exit relay operators are trying to do MitM-attacks on our users). However, after we kept a close eye on KAX17 relays, both the spikes at the begin of each month and the relay flooding stopped, and it's fair to assume that the KAX17 operator was responsible for both of them as well.
The KAX17 operator was active for months, maybe years, on the network. How is that possible?
There are three important reasons for that failure on our side worth highlighting here:
Firstly, while it is always way easier to connect different dots like the ones above in hindsight, our failure to do so should remind us that we invested too little effort in our bad-relay tooling over the last several years. In particular, we did not monitor the network closely enough to understand which kind of tools we might still be missing or would be needed most.
Secondly, fighting and removing malicious relays requires trust between all involved parties (volunteers, Tor staff, and directory authorities) and that trust was missing from time to time, creating friction and frustration, which in turn made the work to detect and reject bad relays even harder.
Thirdly, and most importantly, we were not set up as an organization to work effectively on detecting and removing malicious relays: we lacked resources to broaden our detection mechanisms and to enhance our relay policies, network monitoring, and community outreach. Even though the previous two reasons were affected by this as well, the organizational shortcoming deserves to be a separate item on this list.
January 2020, we launched the Network Health team at Tor with the goal to get our work related to monitoring the network and the health of our relays up to speed and organized. This included bad-relay work. While having dedicated staff working in that area is important, it is no remedy to the third item of the list above by itself. We needed to set up policies for our day-to-day work that matched expectations related to the trust angle mentioned above. Additionally, in the past, the Tor Project teams worked in silos. We needed to increase coordination between teams to do this work effectively. In particular, working closely with the Community team has given us another very efficient tool in our fight against malicious relays.
Beyond increasing coordination and organization, we developed new tools that help in the fight against exit relays doing MitM- attacks on our users and in proactively detecting groups of potentially malicious relays. We believe that this proactive stance has significantly raised the bar for malicious operators entering the network during the past months which is an exciting development in this arms race.
Many of the improvements discussed above come from organizing our work in new ways, like opening lines of communication and coordination between teams. These improvements are also possible because of the investment of our donors and funders. We were able to increase capacity and stability with additional funding and the support of our community. Thank you to everyone who has made a donation in the last few years; you've made it possible for us to fight back and keep users safe.
What is coming up next?
We believe we have ramped up our bad-relay work significantly during the past year and we are moving in promising directions. Coordinating with the Tor Browser team resulted in Tor Browser 11.5 (due later this year) that will ship an HTTPS-Only mode enabled by default, which should help tremendously in the arms race with exit relays trying to MitM user connections: while we do have several defenses in place against those attacks, we believe that HTTPS-Only mode in Tor Browser will be a game-changer as it will strongly reduce incentives to spin up exit relays for MitM attacks in the first place. You can try out a recent desktop alpha release where this change is already getting some testing (and please report any bugs you find so we can improve the HTTPS-Only user experience where needed).
We are in particular excited about upcoming funding for work on more tools in our bad-relay toolbox including new and improved ways of monitoring our network, but also building a stronger relay operator community, given that just relying on monitoring tools is not a sufficient strategy in the bad-relay area. A stronger relay community is an essential building block in our longer-term plan to limit large-scale attacks on the network which we hope making progress on in the upcoming months and years.
So, stay tuned!
Please remember: if you witness malicious relays in our network, please report them to us for our users' safety sake to bad-relays[@]lists[.]torproject[.]org. We need everyone staying vigilant and helping to keep our users safe as malicious operators are trying to adapt their strategies to get around our defenses.
This is a companion discussion topic for the original entry at https://blog.torproject.org/malicious-relays-health-tor-network/