[tor-relays] Exit Relays: What is your DNS timeout rate?

Hi,

the upcoming relayor release will contain new prometheus
alert rules that should help operators detect/mitigate/prevent
some common operational issues.

One of these rules will alert on high DNS timeout rates [1] on exit relays
that should be investigated and resolved to improve the Tor Browser experience for users.

To define the default alert threshold it would be relevant to know
current timeout rates by multiple exit operators.

So if you feel up to it you can send me (off-list) your current timeout rates (30 day graph):

If you run tor exits with relayor and MetricsPort enabled you can use these Prometheus queries:
timeout percentage by server:
(sum by (instance)(rate(tor_relay_exit_dns_error_total{reason="tor_timeout"}[15m])))/(sum by (instance)(rate(tor_relay_exit_dns_query_total[15m])))*100
DNS query rate:
sum by (instance)(rate(tor_relay_exit_dns_query_total[15m]))

If you run exits without relayor you can use these queries:

(sum by (job)(rate(tor_relay_exit_dns_error_total{reason="tor_timeout"}[15m])))/(sum by (job)(rate(tor_relay_exit_dns_query_total[15m])))*100

sum(rate(tor_relay_exit_dns_query_total[15m]))

kind regards,
nusenu

[1] prometheus: alert on high DNS timeout rate · nusenu/ansible-relayor@9b69375 · GitHub

···

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays