[Proposal] mimicry-pt: An LSTM-based Pluggable Transport mimicking YouTube/game site traffic

Here are some unsorted references on traffic shaping or mimicry of video streams. It’s a good idea to read a few of these and examine how relevant the ideas are to your system. A good place to start would be section 6 of the “Protozoa” paper, which similarly evaluates against a classifier.

“Poking a Hole in the Wall: Efficient Censorship-Resistant Internet Communications by Parasitizing on WebRTC” 2020
https://github.com/net4people/bbs/issues/55

“DeltaShaper: Enabling Unobservable Censorship-resistant TCP Tunneling over Videoconferencing Streams”

“CovertCast: Using Live Streaming to Evade Internet Censorship” 2016

“Facet: Streaming over Videoconferencing for Censorship Circumvention” 2014

“Enhancing Tor’s performance using real-time traffic classification” 2012

“Voiceover: Censorship-Circumventing Protocol Tunnels with Generative Modeling” 2023
Generative modeling as a source of traffic schedules.

“Learning to Behave: Improving Covert Channel Security with Behavior-Based Designs” 2022
The term “behavioral independence”.

“Maybenot: A Framework for Traffic Analysis Defenses” 2023

“Beauty and the Burst: Remote Identification of Encrypted Video Streams” 2017

“Extended Abstract: Traffic Shaping for Network Protocols: A Modular and Developer-Friendly Framework” 2025

1 Like

Do you mean like: can you “spoof” a VK cdn connection, but to a foreign server? Or do you mean renting a VK IP to do that? Or did you mean something else?

With the first option, it’s ok. There’s no checking of whether an SNI matches the “real IP” of a service, probably because it would increase the latency of traffic by too much.

But I don’t know about that. Is this similar?

Thank you so much for the references — really helpful.

Three papers caught my eye in particular: “Voiceover” (generative modeling for traffic schedules), “Learning to Behave” (behavioral independence), and the 2025 extended abstract (to get a sense of the current SOTA). I’ll start with these.

Thank you for the clarification!

The core idea is statistical rather than infrastructural. The hypothesis is that if IP checks are only triggered by suspicious traffic, making the traffic pattern statistically indistinguishable from VK CDN could allow it to bypass that layer entirely — the system treats it as routine traffic and never escalates to IP verification.

That said, I don’t have visibility into how Russian censorship actually pipelines its decisions, so whether major services are whitelisted at the IP level regardless of traffic pattern is an open question. The effectiveness depends heavily on that internal structure.

The most realistic use case I see is using mimicry to strengthen something like WebTunnel rather than as a standalone solution.

But won’t it be faster if it’s standalone?

Great paper on that: https://censoredplanet.org/papers/tspu-imc22.pdf

Essentially, there are different “SNI” algorithms blocking different sites, so different traffic can be blocked differently. Most blocks are SNI-only, like Twitter, YouTube, Discord (probably hard to block completely because of it being hosted on Cloudflare). Those kinds of blocks can be bypassed using advanced circumvention/spoofing software. Some obfs4 bridges that are found are blocked by IP/port, Tor guard relays are blocked the same way automatically, There is no blocking from incoming connections (even if your IP is blacklisted, you can still connect to a RU IP), in-country traffic is also subject to DPI inspection and blocking.

Some WebTunnel bridges were manually SNI-blocked, it’s not very clear if they tried to do IP blocking with them.

Recently, WhatsApp got IP-blocked, some Telegram sites also. Facebook stays accessible while SNI spoofing. There’s no IP checking for whether the IP/domain match. Some (foreign) ASN’s get throttled, although it’s pretty inconsistent and may be lifted randomly.

After recently blocking Telegram, there were reports that said that the DPI infrastructure is overloaded, which is allowing some WhatsApp traffic to slip through.

This is very helpful, thank you. The fact that IP/SNI consistency checks are not being performed is a crucial point. This means that SNI-level cover identities function without going through the actual service infrastructure.

The WebTunnel SNI blocking example is particularly interesting. If we assume these bridges are initially flagged by traffic pattern analysis and then manually SNI blocked, statistical shaping would prevent the initial flagging, and the SNI blocking would not occur. This is precisely the gap this approach is trying to fill.

The DPI overload report obtained after the Telegram block also suggests that there are actual capacity limitations in the inspection infrastructure, providing grounds for supporting a probabilistic inspection model rather than an exhaustive one.

They are manually blocked, if found.

There recently was a report about them wanting to implement a DPI network that supports 30 times more than the whole RU internet traffic bandwidth.

I see, so the WebTunnel blocking was entirely manual, not pattern-triggered. This changes my perspective somewhat.

The 30x DPI upgrade is a crucial data point. Once capacity constraints are no longer an issue, the approach of using infrastructure overload as a secondary effect will no longer work. Statistical indistinguishability at the pattern level will be the only reliable long-term strategy.

Perhaps mimicking a niche service would be a better long-term strategy.

Thank you for your suggestions and feedback. Based on your advice, I’ve created an issue on Tor GitLab to continue the discussion and more formally track progress.

Issue link: PoC: generative LSTM traffic shaping achieves 92–96% statistical similarity to YouTube traffic after network traversal (#186) · Issues · The Tor Project / Anti-censorship / Team · GitLab

The system framework (bridge, client, protocol framing) is publicly available at the repository linked in the issue. Note that the GitHub repository linked in my earlier post has been made private — I had shared it temporarily, but moved the public release to GitLab and closed the GitHub one to avoid confusion. Due to dual-use concerns, the trained models are kept private, but I’m happy to invite anyone who wants to review the full implementation as a project member.

I’d appreciate your participation in the GitLab discussion!

We should be wary of their statements. The DPI is already overloaded with filtering fake zapret packets . There was a statistic somewhere that most BitTorrent traffic contains traces of packet modification.

Most likely, the focus will be on internet isolation and administrative pressure.

Yes, enforcing it is probably much harder than talking about it, so I don’t know if we can expect that from them by 2030.

@unic3rn @reseacher

That’s interesting. If DPI is already suffering from noise issues, a bridge acting like a proxy for actual services might be more practical than expected.

And indeed, increasing it 30 times by 2030 seems quite impossible.

An indirect sign of problems with traffic filtering is that the function of detecting VPNs has been assigned to businesses (Ozon, VK, etc.) using their apps. For Tor Browser, this isn’t as relevant.

But even introducing administrative liability for VPN use is being considered. This is if it apparently fails to stop their use.