Arti 1.0.0 is released: Our Rust Tor implementation is ready for production use

shadykaty · September 7, 2022, 8:06am

Some thoughts / a bit of context about performance comparability between Rust and C, for those not familiar.

One nice thing about Rust’s compiler is that it is built on LLVM. For those not familiar, LLVM is a compiler infrastructure. It provides an intermediate form called Intermediate Representation or IR, and various optimizations which work on IR semantics. So if you can write a compiler which produces IR, you can use LLVM optimizations. This is a good thing for making new systems langs because you can use many of the same optimizations which allow C to be fast.

If you have some Rust code, and some perfectly equivalent C code, and compile them using rustc and clang, you will get the same or very close to the same machine code.

The subtle thing often missed in discussions of “X lang has C-like performance” is that equivalent code usually isn’t actually equivalent due to differences in the languages’ semantics. One of the ways Rust is different from C is that it injects runtime bounds checks for things which are only checked in C if you wrote a check yourself. So, there are potentially extra branches injected, which may be unreachable, which may impede performance. If your C isn’t bounds checking (and doesn’t need to) and your Rust is, your Rust is doing extra work and that work has a nonzero cost. But:

these can be explicitly guaranteed to be unreachable, thus removing the extra code and giving you identical perf
nearly all of the the overhead from a check is failing branch predictions, which should never or almost never happen
if the code in question is not executed very frequently, the average overhead over the process’s lifetime will be negligible

Another tricky thing about the “C-like performance” we often demand from other langs is: writing performant C is actually pretty hard. You can’t just write some working C and be done, your C has to be written in a way that the compiler will produce sane code. The relative lack of high level abstractions in C are both a blessing and a curse. Idioms provided by the language are usually highly optimized. Lacking many high level abstractions in C leaves many idioms up to the author, and if the author isn’t an expert, their code will be worse and less performant than if they’d used a high level abstraction someone else wrote. So while Rust’s abstractions may be worse than expertly written C code, they may also be better than an average programmer’s C code. High level abstractions can be a performance improvement - they aren’t necessarily a performance detriment.

tl;dr: As someone who has written Rust and C and has a light obsession with performant code, I would expect the rewrite to be a little better in some places, a little worse in others, and roughly equivalent on the whole. Spots with drastic performance losses can be optimized to as good as they previously were, with near certainty. While I personally don’t enjoy writing Rust and it wouldn’t be my first choice for a rewrite, I still think that on the whole this is an improvement to the Tor project.