Polling Roblox private servers without getting rate-limited

Problem

What's actually being polled

SP-Legends is a Roblox community hub built around +1 Skill Point Legends. The visible feature on the home page is the live private-server tracker: a list of community-shared private servers with current player counts, refreshing in near-real-time. "Near-real-time" sets a budget. The Roblox API is the upstream; rate limits set the ceiling.

Cadence

Per-endpoint cadence: 20 s fast / 40 s slow

Picking the polling interval is the most consequential design decision. Users cannot tell the difference between "5 seconds stale" and "20 seconds stale" on a list of player counts. They can tell the difference between "live" and "broken because we got rate-limited", which is what too-fast polling eventually buys you. The interval is also bounded below by the upstream's own update cadence: there is no point pulling fresher than the API is willing to serve.

Two endpoints feed the live page, and they don't change on the same timescale, so they don't share a tick:

Fast — 20 s. The paginated /private-servers list. This is what users actually watch refresh: per-server populations, who's in which server. Freshness here is the user-visible feature.
Slow — 40 s. The aggregate /games?universeIds=… total-player count. This number shifts on minute-scale and isn't worth a request every 20 seconds; it's polled every other fast tick and the previous value is reused in between.

Twenty seconds is the smallest fast cadence that:

Looks live in human perception.
Stays well below any plausible per-IP rate limit on the endpoints I touch.
Survives a 2x or 3x burst (e.g. a manual refresh on top of the loop) without tripping anything.

Splitting the cadence cut roughly half of the per-hour request count against the slow endpoint without affecting freshness on the page.

Shape

The refresh task

The refresh task is one tokio task spawned at startup. It loops:

Sleep until the next tick.
Make the small set of HTTP requests needed to refresh the model.
Parse responses into typed values (no string-shaped state).
Write the new snapshot into the in-memory AppState with a single swap, not a series of partial mutations.
Persist to disk: private_servers.json for the current snapshot, append to ps_history.bin for the rolling history.
If anything failed, fire an admin alert. Continue the loop.

Failure

The failure mode I optimized for

The pessimistic case is not "the refresh task fails once". The pessimistic case is "the refresh task fails for an hour and nobody notices". Three design choices fall out of that:

Backoff is paranoid but bounded. A single failure waits the same twenty seconds and tries again. A run of failures backs off — exponentially up to a small ceiling — but never disables polling. Self-healing is more valuable than throughput on a hobby site.
Alerting is fire-and-forget. The HTTP path that produces the alert never blocks on the alert delivery. tokio::spawn, post the webhook, drop on the floor if Discord is down. The alternative is a polling task that gets stuck on a slow webhook and starves the actual refresh.
Retry context survives restarts. Each fetch failure (total-players, private-servers, share-code resolve) is recorded against a named "kind" with a consecutive-count, a first-failure timestamp, the last error string, and the last HTTP status. The map is mirrored to a small JSON file on every change, so a process restart in the middle of an outage doesn't reset the timeline. Alerts only fire at consecutive-count milestones (1, 5, 25, 100), and a "recovered" alert closes the incident when the streak ends — between milestones the context is updated silently.

Persistence

Persistence as a separate concern

The refresh task does not own the persistence format. It hands a snapshot to a persistence module that knows how to write JSON for the live state and a compact binary append for history. Two reasons:

If the persistence layer needs to change format — and it has — the refresh task does not need to know.
The refresh task can be tested with an in-memory fake persistence; the persistence layer can be tested without a network.

The history file is a binary append-only log because text-shaped histories grow obnoxiously fast and binary parsing is cheaper than JSON for this workload. The cost is a tiny amount of extra code; the benefit is six months of history fitting in a sensible size.

Coupling

State swap, not state mutation

The HTTP handlers that serve the live page read from the same AppState the refresh task writes to. The discipline is to never mutate that state in place: the task computes a new snapshot, then swaps it in atomically. That way a request that lands mid-refresh sees either the old snapshot or the new snapshot, never a half-formed view. The cost is one extra allocation per refresh; the benefit is that no handler ever sees inconsistent state.

Hindsight

What was already on the list

Two follow-ups had been sitting in the "fix when it bites" pile for a while. Both shipped together in the rework above:

Per-endpoint cadence. Originally everything refreshed on the same 15-second tick. The endpoints have different change rates, so the total-player aggregate is now polled every 60 seconds while the private-servers list runs at 20 seconds. Same freshness on the page, fewer requests against the slower endpoint.
Structured retry context. Alerts used to tell me that a refresh failed. They now tell me why: a stable kind ("roblox.total_players", "roblox.private_servers", "roblox.share_code_resolve"), the consecutive failure count, when the streak started, the last error message, and the last HTTP status — all persisted to disk so a restart doesn't lose the incident timeline.

Neither was urgent at the time. The refresh task had held up across multiple Roblox API changes without a rate-limit incident, and the polling story was, for a long time, the least interesting thing about SP-Legends. Both items took a quiet afternoon to ship and immediately paid for themselves the next time upstream wobbled.

POST · ROBLOX TOOLING

Polling Roblox private servers