[tor-relays] the case for relay state persistency

Hello everyone,

In our diskless infrastructure [0][1], we store all configurations generated by the node inside the TPM's memory. These memories are very limited (1-9k), but more than sufficient for saving the long-term keys of the relays.

However, we have a problem: when rebooting the machines, we lose the state file and, with it, the bandwidth measurements of the various relays. This is quite annoying because upon reboot, our relays all start up announcing 0 B/s, and it takes them days to regain their bandwidth.

Unfortunately, these files are too large to be stored in the TPM. We would like to be able to generate the BWHistory configurations based on some arbitrary values that we write into the node configurations, and then let the node handle redoing the measurements upon reboot. However, among all the configurations, it's not clear to us which ones are important for our needs and which others we can leave empty.

In general, we would like to open a discussion, in view of Arti, about the organization of the datadir. Currently, it's not very clear which files are important for runtime, which are for persistence, etc. From our point of view, it would be quite useful to be able to group persistent data into just a few files, for example, one for all keys and one for states/configurations.

Thank you,

g

[0]: Patela: A basement full of amnesic servers - Osservatorio Nessuno
[1]: Patela v2: From certificates to hardware - Osservatorio Nessuno

···

_______________________________________________
tor-relays mailing list -- tor-relays@lists.torproject.org
To unsubscribe send an email to tor-relays-leave@lists.torproject.org

1 Like

In our diskless infrastructure [0][1], we store all configurations generated
by the node inside the TPM's memory. These memories are very limited (1-9k),
but more than sufficient for saving the long-term keys of the relays.

However, we have a problem: when rebooting the machines, we lose the state
file and, with it, the bandwidth measurements of the various relays. This is
quite annoying because upon reboot, our relays all start up announcing 0
B/s, and it takes them days to regain their bandwidth.

Unfortunately, these files are too large to be stored in the TPM. We would
like to be able to generate the BWHistory configurations based on some
arbitrary values that we write into the node configurations, and then let
the node handle redoing the measurements upon reboot. However, among all the
configurations, it's not clear to us which ones are important for our needs
and which others we can leave empty.

I think the diskless relay idea is great and we should work to support it.

Part of the challenge with making the bandwidth entries in the relay
descriptor configurable is that "bandwidth inflation attacks", where
a relay claims a high bandwidth history when it didn't actually see
that level of traffic, are still effective against our naive bandwidth
measurement designs.

That's why we have avoided making it easier for bad relay operators to
just configure a number. But the bar is not high -- you could change the
source quite easily to just say a larger number, but also you could just
change the numbers in the state file.

For background on more robust bandwidth measurement designs, which are
still only research papers, see

I was going to offer a quick patch with a new torrc option for configuring
what bandwidth you want to claim (see bwhist_bandwidth_assess()), but I
realized that if you are willing to do a bit of scripting (and I think
you already have some scripting in your tool), crafting a state file which
induces a given claimed bandwidth should be easy too. But then I started
messing about with a proof-of-concept -- to show just how easy it is of
course :wink: -- and I found some confusing things that look like bugs. I'll
plan to follow up here if I manage to sort them out and/or file tickets.

In general, we would like to open a discussion, in view of Arti, about the
organization of the datadir. Currently, it's not very clear which files are
important for runtime, which are for persistence, etc. From our point of
view, it would be quite useful to be able to group persistent data into just
a few files, for example, one for all keys and one for
states/configurations.

That goal makes sense to me, but depending on how we solve it, it will
come with tradeoffs. Tor's current state file has a combination of
client-side info (e.g. CircuitBuildTimeBin, Guard, TotalBuildTimes),
relay-side info (e.g. BWHistory*, LastRotatedOnionKey, Accounting*,
TransportProxy) and items that apply to both or more (e.g. Dormant,
MinutesSinceUserActivity). Putting it all in one place reflect's Tor's
peer-to-peer design where a single Tor instance can play multiple roles,
and you lose features if you try to partition state into too few roles
-- for example I've seen use cases where a Tor client offering an onion
service relies on AccountingMax.

More generally, all of the entries in the state file really are for
persistence, sometimes with security implications (like Guard), sometimes
with network health implications (like CircuitBuildTimeBin). You can read
more about the current state lines in doc/state-contents.txt in your
torgit. And you can read more about the files in your DataDirectory in
the FILES section at the bottom of 'man tor'.

Thinking more about it... I think we've already done much of what you
requested, in that we've consolidated everything you should want for
persistence (besides the keys/ directory) in the state file. (Exceptions
are if you're an onion service then you want to keep your onion service
keys, and if you offer or use a pluggable transport you'll want to
consider your pt_state directory, and if you're a v3 directory authority
then there are a bunch more files but there are only 9 of those.)

So: do you want better documentation of state entries, or better
partitioning of them by roles, or maybe this is more "can you just make
them use less total space"? :slight_smile:

--Roger

···

On Mon, Feb 16, 2026 at 09:55:24AM +0100, Gilberto via tor-relays wrote:

_______________________________________________
tor-relays mailing list -- tor-relays@lists.torproject.org
To unsubscribe send an email to tor-relays-leave@lists.torproject.org

That goal makes sense to me, but depending on how we solve it, it will
come with tradeoffs. Tor’s current state file has a combination of
client-side info (e.g. CircuitBuildTimeBin, Guard, TotalBuildTimes),
relay-side info (e.g. BWHistory*, LastRotatedOnionKey, Accounting*,
TransportProxy) and items that apply to both or more (e.g. Dormant,
MinutesSinceUserActivity). Putting it all in one place reflect’s Tor’s
peer-to-peer design where a single Tor instance can play multiple roles,
and you lose features if you try to partition state into too few roles
– for example I’ve seen use cases where a Tor client offering an onion
service relies on AccountingMax.

More generally, all of the entries in the state file really are for
persistence, sometimes with security implications (like Guard), sometimes
with network health implications (like CircuitBuildTimeBin). You can read
more about the current state lines in doc/state-contents.txt in your
torgit. And you can read more about the files in your DataDirectory in
the FILES section at the bottom of ‘man tor’.

Thinking more about it… I think we’ve already done much of what you
requested, in that we’ve consolidated everything you should want for
persistence (besides the keys/ directory) in the state file. (Exceptions
are if you’re an onion service then you want to keep your onion service
keys, and if you offer or use a pluggable transport you’ll want to
consider your pt_state directory, and if you’re a v3 directory authority
then there are a bunch more files but there are only 9 of those.)

So: do you want better documentation of state entries, or better
partitioning of them by roles, or maybe this is more “can you just make
them use less total space”? :slight_smile:

Thank you for the detailed response, I’ll start from your conclusions. I hadn’t found the documentation about the state file and it looks excellent. I think the point is that currently it’s not intuitive to understand what needs to be saved and in which context. As you rightly pointed out, for us it might be convenient to script the generation of the file from some data we save periodically, but in light of this, I think it would be useful to document:

  • which fields are mandatory (if absent, could they prevent tor from starting)
  • which fields, if duplicated, can be shortened to the last value (for example CircuitBuildTimeBin)

Basically, if I take a state file from one of our relays, I have a ~9k file andI need to reduce it to <5k net of compression, while trying to lose the leastamount of functionality possible. What are the effects of deleting the variousCircuitBuildTimeBin and Guard entries that take up 90% of the file?

Hello.

Roger Dingledine wrote:

I was going to offer a quick patch with a new torrc option for configuring
what bandwidth you want to claim (see bwhist_bandwidth_assess()), but I
realized that if you are willing to do a bit of scripting (and I think
you already have some scripting in your tool), crafting a state file which
induces a given claimed bandwidth should be easy too.

Does this mean that one solution for the extremely poor consensus weight
for non-EU relays is to increase the reported bandwidth in the state
file? Assuming, of course, that the relay *actually* has sufficient
network capacity and is not malicious. I hate that the vast majority of
my relay traffic comes from three NL relays, despite the family having
40 members.

Regards,
forest

···

_______________________________________________
tor-relays mailing list -- tor-relays@lists.torproject.org
To unsubscribe send an email to tor-relays-leave@lists.torproject.org

forest-relay-contact--- via tor-relays:

Hello.

Roger Dingledine wrote:

I was going to offer a quick patch with a new torrc option for configuring
what bandwidth you want to claim (see bwhist_bandwidth_assess()), but I
realized that if you are willing to do a bit of scripting (and I think
you already have some scripting in your tool), crafting a state file which
induces a given claimed bandwidth should be easy too.

Does this mean that one solution for the extremely poor consensus weight
for non-EU relays is to increase the reported bandwidth in the state
file?

No. This risks you getting blocked from the network due to the bandwidth inflation attack Roger describes in a previous post to this thread.

Georg