What if an adversary is running all 3 nodes that your circuit has picked?

Hello all, i’m trying to make a quality post so please bear with me.

Let’s assume that you are currently accessing website Y via the Tor browser, and that you just so happened to get a circuit going through 3 malicious tor nodes. What then ?

The adversary is running the malicious nodes B1, B2, and B3.

Does the traffic (B1,B2,B3) look different than the traffic (B1,C1,C2) or (B1,A1,A2) ?

Is the adversary able to PROVE that the traffic originated from Bob’s Laptop and arrived at Destination website Y beyond doubt ? Because if this the case right now, i genuinely believe that this shouldn’t be tolerated and put on the Tor roadmap. ALL traffic that comes and goes from tor nodes should have the same packet size more or less, to make sure that the adversary can’t tell where any particular connection is coming from and going. They must all blend together and look the same from the adversary’s perspective. There should be decoy traffic being sent around.

I highly suggest reading on what nym is trying to achieve Mixnet and mullvadvpn’s DAITA implementation DAITA: Defense Against AI-guided Traffic Analysis , which is aimed at just that, to make sure that the connections look the same.

Is it possible to make it so that all traffic between tor nodes look the same, at all times, to prevent traffic correlation from malicious nodes ? Is this already planned ? If it is not planned yet, can this be put on the roadmap ?

This is the main criticism i have on Tor (despite how much i respect the project), it shouldn’t matter who runs tor nodes to the users, as they shouldn’t be able to correlate any user’s traffic to any other users’ traffic

1 Like

Snowflake supposedly has passive traffic going in background, but no one gave me concrete answer whether it was intentional against traffic analysis or side effect for having webrtc mimicry.

Hey! Why am I always the bad guy? Why can’t it be Jack or Jill or Harry? And how did you know I’m using a laptop? :slight_smile:

Seriously, what are the odds that B1, B2, and B3 are run by the same adversary. It must happen. Agreed it should make no difference to the user.

I read both links. Making all packets the same size with noise makes sense like the people at DAITA suggest. Mixnet goes one step further by considering when a session starts and stops.

The one thing I can think of with the start and stop is congestion. Many remark that Tor is slow because of the paths it has to take so stretching a session which should be 20 packets to 30 or 40 would create more traffic and make things even slower.

I’m thinking about how it all starts like the handshake (SYN,ACK,SYN stuff) and setup between both ends for certificates, etc, then the eventual http GET domain/frontpage and subsequent reply. It should all be wrapped or encapsulation in continuous traffic to avoid detection. Think of all that extra bogus traffic and what it would do to speed.

So now you have gotten your front page and are reading. It’s a three minute read then you click next. So, is there continuous traffic during your read? Because, if not, then as soon as your “criminal” user starts putting out traffic it starts again at the other end. Hmmmm. Maybe little bursts of packets which go nowhere??

I’m assuming the smart folks at Karlstad U have considered this and designing a solution.

Those VPN people only have to consider bogus traffic between them and the user. Tor has 3 more.

I think you mixed up two scenarios in your post.

As far as i understand the links you posted describe mitigation against an adversary who isn’t running relays but has the ability to see the traffic between relays run by other people. This is called a traffic correlation attack and you can find lots of research on the internet describing different scenarios.

The scenario you describe is a sybil attack and using cover traffic to hide patterns in the packets has no effect here, since the adversary can run non standard software and modify any packet going out to another one of his bad relays to track it’s path. Depending on the software used he may even be able to prove which traffic is coming from you. The defense against this attack is to have trusted relay operators and remove bad relays from the network.

@BobbyB
imho the latency shouldn’t be the priority, anonymization should always remain the priority, at the expense of the rest at first, and later down the road it should be improved yes.

i’m guessing with Arti coming into fruition the additional packets being sent back and forth shouldn’t be a problem performance-wise ?

@jarl

The defense against this attack is to have trusted relay operators and remove bad relays from the network.

that’s the problem, it’s all too easy for an adversary to spin up tor nodes disguising as a new operator, and you have the illusion of diversity while one adversary controls a large part of the network, see how many tor nodes are in germany right now, it’s definitely a sign imo.

Manually removing bad relays from the network is a bandaid solution to a bigger problem. So yea i think relay operators should not be able to tell where traffic is coming from and going to. There could simply be checks implemented to validate that the peers are using a supported tor version, to refuse peers with modified tor software

1 Like

imho the latency shouldn’t be the priority, anonymization should always remain the priority, at the expense of the rest at first, and later down the road it should be improved yes.

There are different networks with different priorities. If tor doesn’t fit your thread-model then using something different is fine. That is no reason to change tor.

So yea i think relay operators should not be able to tell where traffic is coming from and going to. There could simply be checks implemented to validate that the peers are using a supported tor version, to refuse peers with modified tor software

If somebody has full control over the machine running a relay and the software/specs to run a relay are open source this isn’t as easy as you think, as this machine is basically a black box. And if somebody has control over a relay they can of course see the previous and the next step. Even without modifying tor.

You describe a situation where all relays in your chosen path are compromised. I don’t think it is technically possible for any system to guarantee anonymity in this situation.

I don’t think it is technically possible for any system to guarantee anonymity in this situation.

there should be a way to achieve this otherwise nym mixnet and daita are chasing a false dream right now. Tor is supposed to be the leading anonymity providing network right now so when another project is proposing anonymity improvements i think this is an opportunity to explore what wasn’t explored yet. and There’s definitely room to improve the anonymity of the protocol (in this case, stopping an adversary’s attempts at deanonymizing users by looking at their traffic patterns), rather than finding excuses to keep it in it’s current state.

Imo the effort (to research, to formulate a proof of concept and to implement the change) can be split into these 3 parts:

Context: The adversary is running nodes B1,B2,B3,B4,B5, and B6

Part 1: How can the tor protocol enforce that the traffic coming from Alice,Bob and Charlie’s tor browser / tor daemons to the B1 node look the same at all times ?

Part 2: How can the tor protocol enforce that the traffic coming from ANY tor node to ANY other tor node looks the same ? (as that way there’s no way to differenciate where Alice / Bob / Charlie’s connections are going through)

It is the protocol itself that should enforce the “blackbox” so that even the node runners can’t follow where connections are going based on the traffic shape / patterns.

Part 3: How can the tor protocol enforce that the traffic coming from node B6 to the hidden services X, Y and Z look the same ? (as that way there’s no way to differenciate where the Hidden Service X / Y / Z connections are going through)

If the entire protocol can make it so that an adversary cannot possibly figure out where connections come from and go to, even when running a massive amount of tor nodes, you’ll massively improve the users’ trust in the network at large, and reduce to 0 the low percentage chance of deanonymization (due to having a circuit pick all malicious nodes)

As anonymity is about making sure that users can perfectly blend in together without any possibility to differentiate one from the other, Tor is lacking anonymity-wise in the context where your connection unluckily only go through malicious nodes that are ran by the same adversary.

The only thing that a node operator should be able to know is that connections are coming from and going to [this set of thousands of IPs] without any possibility to differentiate one from the other. Imo if Tor manages to solve that problem, the Anonymity is going to be improved many times over.

(feel free to correct me if i’m wrong btw)

For some inspiration i recommend looking at the Monero project, despite having great anonymity-preserving features like Dandelion++ (which employs decoys to prevent the adversary from figuring out from and to which IP the transactions are going) and despite having that they went ahead and started to implement an even better anonymity-preserving solution called FCMP++ to increase the anonymity that many times over.

Anyways i think this is something to consider in depth i think, it may not be easy to implement but i think the potential added value is immense

1 Like

You misunderstood the problem they try to solve. The goal with bogus traffic isn’t to make it impossible for a relay operator to correlate connections but to make it harder for someone who is watching the connection between relays. So the adversary in this scenario isn’t an “evil” relay but the ISP of a “good” relay.

This is nothing new and there were some design choices made. So as i said, if you prefer stronger anonymity over lower latency this is a case of right tool for the right job, not necessarily a problem with tor.

It can’t (on the current Internet) since there is an IP-Address and Port associated with those connections. The only way to solve this problem is afaik to send all traffic to all users which would be massive ovrehead.

Node runners don’t have to search for patterns. They already got all information since they know which relay they got a packet from and if they decrypt it they know to which destination this packet should be send to. Looking for traffic patterns is something an evil isp or some other entity who can watch the internet has to do if they don’t have access to the relays themself.