Building a Real-Time TCP WebSocket Multiplayer Engine in Rust for JavaScript Browser Clients

A technical walkthrough of Rainboids’ authoritative-server multiplayer, written as a teaching text for engineers who want to build something similar.


1. Why this article exists

Rainboids is a browser-based co-op shooter. The single-player game has lived
happily inside one JavaScript bundle. Adding multiplayer broke that
simplicity in three places at once:

  1. Authority. Someone has to decide who hit what, who died, who picked up
    the orb. If two clients disagree, the players see two different games.
  2. Latency. A 60 Hz game over the public internet has 30–150 ms of network
    delay. The local ship has to feel snappy anyway.
  3. Cheating. A client that simulates its own gameplay can lie. The server
    has to be the source of truth.

The solution we landed on is the same shape Quake III shipped in 1999 and
which every co-op shooter has used since: an authoritative server running
the real simulation, with clients that predict their own ship locally and interpolate everyone else. What is new here is that the server is written
in Rust while the client runs the same simulation in JavaScript — and a
parity test rig in CI guarantees the two implementations stay bit-identical
on the parts that matter.

This article walks through the design and implementation of that system as it
exists in the server/ crate, the schema/ directory, and the planning docs
under docs/. The intent is that a motivated engineer who has shipped a
single-player game could read this top to bottom and have a credible mental
model of how to build the next one with online multiplayer.

The companion planning documents (docs/Multiplayer Rust Server – 2026-05-07.md,
docs/Multiplayer Rust Client Engine – 2026-05-07.md,
docs/Multiplayer Wire Format – 2026-05-09.md, and schema/SIM_SPEC.md) are
worth reading after this article for the parts we skim.


2. The fundamental architecture

   ┌──────────┐                              ┌──────────────────────┐
   │ Browser  │  WebSocket (binary frames)   │  rainboids-server    │
   │  client  │ ◀──────────────────────────▶ │  (Rust, 1 process)   │
   └──────────┘                              └──────────────────────┘
        │                                              │
        │ runs js/sim/  (the SAME simulation,          │ runs server/src/sim/
        │   for local prediction & solo play)          │   authoritative
        ▼                                              ▼

           parity harness (CI) diffs the two on golden fixtures

Three load-bearing claims hide inside that picture:

  • The server’s simulation is canonical. Anything that matters for
    gameplay state — HP, enemy positions, which bullet killed which asteroid —
    is decided server-side and broadcast to clients.
  • The client runs its own copy of the simulation for two reasons: solo
    play (no server involved) and prediction (so the local ship doesn’t feel
    laggy). Prediction means the client doesn’t wait for a server round-trip
    to move its own ship — it simulates the input immediately, then reconciles
    against the server’s view when a snapshot arrives.
  • Both simulations are kept in sync by engineering discipline. Identical
    module layouts (js/sim/ship.jsserver/src/sim/ship.rs), shared
    algorithm constants, fixed-point arithmetic for the prediction-relevant
    fields, and a parity test that replays the same inputs through both
    implementations and diffs the output.

The asymmetric trade is the central premise of choosing Rust here: you pay
implementation-twice cost for bounded tail latency, raw CPU efficiency, and a static-binary deploy. If you weren’t going to operate the
server for years, you’d write the server in JS and pay neither cost. We did
choose Rust; the rest of this article is how you make that trade work.


3. The tech stack

… and the reasoning behind it.

From server/Cargo.toml:

tokio = { version = "1", features = ["full"] }
axum = { version = "0.7", features = ["ws", "macros"] }
futures-util = "0.3"
serde = { version = "1", features = ["derive"] }
bincode = "1.3"
dashmap = "5"
glam = { version = "0.27", features = ["serde"] }
rand = "0.8"
rand_pcg = "0.3"
uuid = { version = "1", features = ["v4", "serde"] }
nanoid = "0.4"
tracing = "0.1"
metrics = "0.23"
metrics-exporter-prometheus = "0.15"

Why each one:

  • tokio is the async runtime. Game servers don’t actually need
    multi-threading until you scale past a couple hundred rooms, but Tokio’s
    select! macro is the cleanest way to write the per-room tick loop
    (compete a 16 ms timer against an unbounded inbound channel). Set
    flavor = "multi_thread" and you get free per-room parallelism when
    you want it.
  • axum is the HTTP/WebSocket router. We use exactly three endpoints
    (/health, /metrics, /ws). Anything heavier (a REST API, sessions,
    cookies) wouldn’t move the needle.
  • bincode is the wire codec. Compact (no field names on the wire),
    schema-from-Rust-types (you write #[derive(Serialize, Deserialize)] on
    your enum and the wire format follows the field declaration order), and
    the encoded bytes are easy to mirror in a JS decoder. See §6.
  • rand_pcg::Pcg64 is the seeded random number generator. We need a
    PRNG whose algorithm is defined in writing (not “whatever your standard
    library does”), because we replicate it in JS with BigInt for the
    client-side simulation. PCG-64 is fast, statistically excellent, and tiny.
  • uuid + nanoid — Uuid v4 for session tokens (long, opaque,
    reconnect-safe). Nanoid for human-typeable room codes (6 chars from a
    28-char alphabet that excludes confusable glyphs like 0/O, I/1).
  • tracing + metrics — structured logging with span correlation,
    Prometheus exporter. Operating a multiplayer game without these is
    flying blind. The room loop emits a rainboids_tick_duration_seconds
    histogram every tick; if anything ever stalls, you see it in Grafana.
  • glam for Vec2 math (mostly outside the deterministic path; see §7).

What we don’t use:

  • No ECS (bevy_ecs, hecs). Entity counts are low (≤200 per room) and
    a Vec<Enemy> with explicit fields is easier to reason about than a
    generic archetype storage. If we ever hit 10,000 entities per room,
    revisit.
  • No tokio-tungstenite directly — axum‘s WebSocket extractor wraps it.
  • No async sleep inside the simulation. Time is the tick counter; “spawn
    in 2 seconds” is “spawn at tick + 120”.

4. The actor model: matchmaking and rooms

The server is a tree of async tasks:

                axum HTTP listener
                       │
                       ▼
              ┌────────────────────┐
              │  ConnectionTask    │   one per WebSocket
              │  (read+write       │   reads bincode frames,
              │   half + outbound  │   routes to a room or to
              │   mpsc<ServerMsg>) │   the matchmaker
              └──────────┬─────────┘
                         │
                ┌────────┴────────┐
                ▼                 ▼
        ┌──────────────┐   ┌──────────────┐
        │ Matchmaker   │   │ RoomActor    │
        │ (singleton)  │   │ (one per     │
        │              │   │  active room)│
        └──────────────┘   └──────────────┘
                              60 Hz tick

The pattern is: one async task per stateful object, and the only way to
mutate that object’s state is to send it a message. This is the classic
actor model. In Rust it lands very cleanly because:

  • The actor owns its state by value. No Arc<Mutex<...>> anywhere.
  • The mailbox is tokio::sync::mpsc. Senders are cheap to clone; the
    receiver lives inside the actor’s run loop.
  • The compiler enforces that nobody else holds a reference to the state.

Let’s look at the Room struct (server/src/room/mod.rs:95):

pub struct Room {
id: RoomId,
code: String,
state: GameState,
sim_state: RoomState, // WaitingForPlayers | Playing | …
inputs: HashMap<PlayerId, InputBuffer>,
rng: Pcg64,
seed: u64,
tick: u32,
players: Vec<Player>,
cmd_rx: mpsc::Receiver<RoomInbound>,
pending_events: Vec<GameEvent>,
grace: HashMap<PlayerId, GraceTimer>,
cfg: std::sync::Arc<Config>,
encode_buf: Vec<u8>,
}

Everything the room needs is in one struct, owned by one task, mutated by
one mailbox loop. A RoomHandle is just a thin wrapper around mpsc::Sender<RoomInbound>:

pub struct RoomHandle {
tx: mpsc::Sender<RoomInbound>,
id: RoomId,
}

External tasks (the matchmaker, the connection task) only interact with a
room by room_handle.send(RoomInbound::Input { ... }).await. They never see
the GameState directly. This means we can never race the simulation tick
against an input being applied — the inbound queue is the serialization
point.

RoomInbound is the full vocabulary of “things that can happen to a room”
(server/src/room/mod.rs:32):

pub enum RoomInbound {
Join { player_id, display_name, out },
Reattach { player_id, display_name, out },
Leave { player_id, reason },
Disconnected { player_id },
Input { player_id, tick, packed },
Ack { player_id, snapshot_tick },
Summary { reply }, // matchmaker pulls cheap counts
Shutdown,
}

Notice that each variant carries the player id — the room doesn’t trust
the sender to be honest about identity. The connection task already knows
which player it represents (issued in the welcome message) and stamps
inbound messages with that id before forwarding.

Why one room per task, not “all rooms in one event loop”

Two reasons:

  1. CPU isolation. A pathological room (1000 bullets, slow tick) can’t
    stall other rooms’ ticks if they’re on different Tokio worker threads.
  2. Mental model. Inside a room, there is exactly one execution context.
    You never have to think about “what if input arrives during tick” because
    select! makes that case structurally impossible — the loop is either
    draining the inbox or computing the tick, never both.

5. The per-room simulation loop

This is the most important code in the server. From server/src/room/mod.rs:430:

async fn run(mut room: Room) {
let tick_dur = Duration::from_secs_f64(1.0 / room.cfg.tick_hz as f64);
let mut tick_interval = interval(tick_dur);
tick_interval.set_missed_tick_behavior(MissedTickBehavior::Burst);
let snapshot_every = (room.cfg.tick_hz / room.cfg.snapshot_hz).max(1);
let mut tick_counter: u32 = 0;
loop {
tokio::select! {
biased;
cmd = room.cmd_rx.recv() => {
match cmd {
Some(c) => room.enqueue(c),
None => break,
}
}
_ = tick_interval.tick() => {
room.drain_inbound();
let started = Instant::now();
room.simulate_one_tick();
metrics::histogram!("rainboids_tick_duration_seconds")
.record(started.elapsed().as_secs_f64());
tick_counter = tick_counter.wrapping_add(1);
if tick_counter % snapshot_every == 0 {
room.broadcast_snapshot();
}
room.broadcast_pending_events();
room.reap_grace();
if room.should_shutdown() { break; }
}
}
}
room.cleanup();
}

Let’s slow down on every line. Each one is intentional.

5.1 Tick pacing

The simulation runs at 60 Hz (16.66 ms per tick). The interval is from
tokio::time::interval, which fires every tick_dur. The
MissedTickBehavior::Burst setting is important: if the runtime is briefly
late and three tick intervals elapse before we poll, we want the next three
calls to tick() to return immediately so the simulation can catch up.
If we picked Delay instead, we’d permanently lose those ticks — the
simulation would silently run slower than the wall clock and clients would
get out of sync. (Skip would silently drop the missed ticks; same
problem, different shape.)

Snapshots go out at 20 Hz, which is snapshot_every = 60 / 20 = 3
ticks. The justification for the 60/20 split:

  • Inputs at 60 Hz because the player can see single-frame artifacts at
    the display refresh rate. Slower input = sluggish controls.
  • Snapshots at 20 Hz because that’s enough to interpolate smoothly
    (~50 ms apart) and triples our bandwidth budget. A snapshot is the
    biggest message we send.
  • Events on demand for things that have to feel synchronized (bullet
    spawn, enemy destroy). See §6.4.

5.2 biased; and why it matters

tokio::select! is normally non-deterministic across its branches — Tokio
picks pseudo-randomly when multiple are ready. biased; makes it always
poll the branches in declaration order. We put the inbound channel first
on purpose: if inputs and a tick deadline both arrive in the same
scheduling slice, we want to drain all inputs before simulating. The
opposite order would let the room simulate one tick with stale input
before draining.

5.3 drain_inbound() — last-write-wins input

When a tick fires, we drain whatever inputs accumulated since last tick:

fn drain_inbound(&mut self) {
while let Ok(msg) = self.cmd_rx.try_recv() {
self.enqueue(msg);
}
}

For RoomInbound::Input, the room keeps only the latest packed input per
player (server/src/room/mod.rs:161):

RoomInbound::Input { player_id, tick, packed } => {
let buf = self.inputs.entry(player_id).or_default();
if tick >= buf.last_tick {
buf.latest = packed.into();
buf.last_tick = tick;
}
}

This is last-write-wins with a tick guard against out-of-order
delivery. It is the most important policy decision for input handling and
it has trade-offs:

  • ✅ The simulation never has to know about jitter. Each tick gets exactly
    one input per player, no buffering, no smoothing.
  • ✅ A player who lags briefly catches up gracefully — their next packet
    replaces the stale one, and the server just keeps going.
  • ❌ Bursts of inputs between two server ticks are lost. If a player taps
    fire-then-stop-then-fire faster than 60 Hz, the middle states never
    reach the server.

For a co-op shooter where button-mashing isn’t the dominant input pattern,
this is the right trade. For a fighting game it would be unacceptable —
you’d need an input queue that the simulation drains one entry per tick.

5.4 simulate_one_tick() — the actual game logic

fn simulate_one_tick(&mut self) {
if !matches!(self.sim_state, RoomState::Playing) {
return;
}
let dt = 1.0 / self.cfg.tick_hz as f32;
let inputs: PlayerInputs = self.inputs.iter()
.map(|(id, buf)| (*id, buf.latest))
.collect();
simulate_tick(
&mut self.state,
&inputs,
dt,
&mut self.rng,
&mut self.pending_events,
);
self.tick = self.tick.wrapping_add(1);
}

The simulation itself (server/src/sim/mod.rs:28) is a pure function:
state, inputs, dt, rng, and an output event buffer in, mutated state out.
No I/O, no clocks, no networking:

pub fn simulate_tick(
state: &mut GameState,
inputs: &PlayerInputs,
dt: f32,
rng: &mut Pcg64,
events: &mut Vec<GameEvent>,
) {
ship::update_all(&mut state.ships, inputs, dt, events);
enemy::update_all(&mut state.enemies, &state.ships, dt, rng, events);
asteroid::update_all(&mut state.asteroids, dt, events);
collision::detect_and_resolve(state, events);
drops::update(&mut state.drops, &state.ships, dt, events);
wave::tick(&mut state.wave, &mut state.enemies, dt, rng, events);
}

This factoring is the single biggest enabler of everything else in this
article. Because simulate_tick is a pure function:

  • We can run it inside cargo test with no runtime, no sockets.
  • The JS client can run a port of it inside simulateTick for prediction.
  • The parity test can run both on the same input and diff.
  • Replay, rewind-and-resimulate, deterministic regression tests — all easy.

If you take one lesson from this article, take this one: make your simulation a pure function early. Every other piece of multiplayer is
easier when this is true.

5.5 The event queue pattern

Notice the events: &mut Vec<GameEvent> parameter. Subsystems push events
rather than directly causing side effects:

// inside collision.rs
events.push(GameEvent::BulletDespawn { id, reason: DespawnReason::Hit });
events.push(GameEvent::AsteroidDestroy { id, by: Some(player), fragments });

The room loop drains pending_events and broadcasts them as
ServerMsg::Event frames after the tick:

fn broadcast_pending_events(&mut self) {
for ev in self.pending_events.drain(..) {
let msg = ServerMsg::Event { tick: self.tick, event: ev };
for p in &self.players {
if !p.lagging { let _ = p.out.try_send(msg.clone()); }
}
}
}

On the client, the same event queue feeds the effect layer — particle
spawns, screen shake, sounds. The simulation never imports the audio
manager or touches the DOM. This is the engine refactor that has to
happen on the JS side before any of the network work can start.

5.6 Snapshots vs events: the bandwidth model

We send two kinds of state:

FrequencyCarriesLost packet OK?
Snapshot20 HzBulk slow-changing state: ship positions, HP, enemy positions/HP, asteroid positions, drop positionsYes — next snapshot replaces it
EventOn demandDiscrete instants: bullet spawn, enemy destroyed, orb collected, wave clearNo — but events are additive so dedup is cheap

You only ever interpolate snapshots. You only ever react to events. They
should never overlap responsibilities.

The snapshot payload (server/src/protocol/...) is currently
SnapshotPayload { ships, enemies, asteroids, drops }. Each entity type
is a tiny f32-typed struct: id, position, velocity, hp. About 16–32 bytes
per entity. With 4 ships + 50 enemies + 30 asteroids + 10 drops at 20 Hz
you’re looking at ~3 KB/s/player. That fits comfortably in any home
internet uplink and is roughly an order of magnitude under what a typical
multiplayer FPS uses.

Future optimization: delta encoding against the receiver’s last-acked
tick. We already plumb base_tick: Option<u32> in the snapshot frame.
For v1 the optimization is unimplemented (snapshots are always full); the
field-shape stays so we don’t have to bump the wire version to add it.

5.7 Backpressure: the lagging flag

The client outbound channel is bounded (OUTBOUND_BUFFER = 256 in
server/src/server/connection.rs:29). When the room tries to broadcast
and the channel is full, that means the client is consuming slower than
the server is producing — usually a stalled WebSocket write:

match p.out.try_send(msg.clone()) {
Ok(_) => {}
Err(tokio::sync::mpsc::error::TrySendError::Full(_)) => {
p.lagging = true;
warn!(player_id = %p.id, "client lagging; dropping snapshots");
}
Err(_) => {}
}

While lagging is set we skip snapshots to that player. (Events, which
must not be lost, do still try.) The plan calls for kicking the client
after sustained lag; for v1 the flag is a flag, not an eviction.

The principle is the most important thing: never let a slow client back up the server. The server’s tick rate is the contract; clients are
ephemeral.


6. The wire protocol

The protocol layer answers: when bytes leave the server, what do they
mean? When the client gets bytes, how does it parse them?

6.1 The single source of truth: schema/protocol.toml

A surprising design choice. Rather than letting “the Rust code” or “the JS
code” be the source of truth, we put it in a third file —
schema/protocol.toml — that neither implementation reads at runtime,
but which both must match:

wire_version = 1
sim_version = 1
codec = "bincode-1.x-default-with-fixint-le"
[[message.client]]
name = "Hello"
fields = [
{ name = "wire_version", type = "u16" },
{ name = "sim_version", type = "u16" },
{ name = "client_version", type = "String" },
{ name = "display_name", type = "String" },
{ name = "session", type = "Option<Uuid>" },
]
[[message.client]]
name = "Input"
fields = [
{ name = "tick", type = "u32" },
{ name = "packed", type = "PackedInput" },
]
# ...

Why a third file:

  • No language gets to be “right by default”. If the JS codec disagrees
    with the Rust codec, neither side has a privileged claim. The TOML file
    is the tiebreaker, and CI’s tools/check-schema.mjs runs on every PR
    to verify both sides match.
  • Codegen target. v2 of this stack will read protocol.toml and emit
    both server/src/protocol/generated.rs and js/sim/protocol-generated.js.
    v1 is hand-mirrored, but the shape of the schema is already designed
    for codegen so we don’t have to relayout files later.
  • Documentation. Variant ordering, fixed-vs-tagged enums, prediction-
    relevant subsets — there’s exactly one place that lists everything.

6.2 The codec: bincode 1.x with with_fixint_encoding

// server/src/protocol/codec.rs
fn opts() -> impl bincode::Options {
bincode::DefaultOptions::new()
.with_fixint_encoding()
.with_little_endian()
}

Three deliberate choices, each load-bearing:

  1. Little-endian. Matches every browser ever and the vast majority of
    server CPUs. No byte-swapping anywhere.
  2. with_fixint_encoding() means integers are written at their natural
    width (u8=1, u16=2, u32=4, u64=8) rather than bincode’s default varint
    encoding. This costs a handful of bytes but it makes the JS decoder
    vastly simpler: every length prefix is exactly 8 bytes, every enum
    tag is exactly 4 bytes, nothing depends on the magnitude of the value
    you’re decoding.
  3. Default field order. Bincode doesn’t write field names, just the
    field values in struct-declaration order. Add a field in the middle of
    a struct and you’ve silently broken the wire format. We use this as a
    discipline gate: append-only struct fields, append-only enum variants. The schema TOML enforces it; the parity tests catch
    deviations.

6.3 Frame layout, concretely

From docs/Multiplayer Wire Format – 2026-05-09.md — the authoritative
byte-level spec:

Every WebSocket binary frame is exactly one bincode-encoded ClientMsg
or ServerMsg. No length prefix, no envelope, no fragmentation.
The frame is the message.

For example, ClientMsg::Hello { wire_version: 1, sim_version: 1, client_version: "5.79.62", display_name: "Pilot", session: None } encodes as:

BytesWhat
00 00 00 00u32 enum tag — Hello is variant 0 of ClientMsg
01 00u16 wire_version
01 00u16 sim_version
07 00 00 00 00 00 00 00u64 length prefix for “5.79.62”
35 2e 37 39 2e 36 32UTF-8 bytes of “5.79.62”
05 00 00 00 00 00 00 00u64 length prefix for “Pilot”
50 69 6c 6f 74UTF-8 of “Pilot”
00Option tag = None

Golden hex dumps for every message variant live in
server/tests/wire_golden.rs. A JS test fixture inside the parity harness
asserts byte-identical output. If you ever change a struct field, the
golden test fails and you know to bump WIRE_VERSION.

6.4 The full message vocabulary

The wire is three tagged unions:

  • ClientMsg — what the client can say. Hello, room intents
    (QuickMatch, BrowseRooms, CreateRoom, JoinRoom, JoinRoomByCode, LeaveRoom),
    per-tick play (Input, Ack, Pong, PowerupChoose, Revive, Chat).
  • ServerMsg — what the server says back. Welcome, Error, RoomList,
    RoomJoined, RoomLeft, PeerJoined, PeerLeft, Snapshot, Event, Ping.
  • GameEvent — discrete in-game events, packaged inside
    ServerMsg::Event { tick, event }. BulletSpawn, BulletDespawn,
    EnemyDestroy, AsteroidDestroy, OrbCollect, PlayerDamaged, PlayerDowned,
    PlayerRevived, WaveStart, WaveClear, PowerupOffer, PowerupChosen,
    HitFlash, DamageNumber.

The full schema is in schema/protocol.toml. The key design decisions:

  • Two messages on one wire. No multiplexing layer; ClientMsg and
    ServerMsg are different enums even though they share the framing.
  • One message per WS frame. No batching for v1.
  • Tick stamps on Input and Snapshot. Lets the client correlate “the
    snapshot I just got was for my input at tick T” — the key prerequisite
    for prediction reconciliation (§7).

6.5 Input packing: 7 bytes per tick

The smallest, most important message:

#[derive(Serialize, Deserialize, Copy, Clone)]
pub struct PackedInput {
pub move_x: i8, // -127..127 normalized
pub move_y: i8,
pub aim_x: i16, // -32767..32767 unit vector
pub aim_y: i16,
pub buttons: u8, // bit 0=shoot, 1=dash, 2=ab1, 3=ab2, …
}

Seven bytes plus a u32 tick stamp plus a u32 enum tag = 15 bytes per
input. At 60 Hz that’s 900 B/s upstream per player, before WebSocket
framing. Negligible.

The server unpacks immediately into a PlayerInput with f32 fields
(server/src/sim/input.rs:22):

impl From<PackedInput> for PlayerInput {
fn from(p: PackedInput) -> Self {
PlayerInput {
move_x: (p.move_x as f32) / 127.0,
move_y: (p.move_y as f32) / 127.0,
aim_x: (p.aim_x as f32) / 32767.0,
aim_y: (p.aim_y as f32) / 32767.0,
shoot: p.buttons & 0x01 != 0,
dash: p.buttons & 0x02 != 0,
ability1: p.buttons & 0x04 != 0,
ability2: p.buttons & 0x08 != 0,
}
}
}

The packed form is a wire concern; the simulation never sees it.


7. Determinism: the hardest problem in cross-language multiplayer

The premise of client-side prediction is that the client can run a copy
of the simulation, get the same answer the server got, and only need to
correct itself when the server’s snapshot disagrees.

For this to work bit-perfectly, the JS simulation has to produce the
same floats as the Rust simulation for the same inputs. The naive approach
(“just use f32 everywhere”) fails the moment you call sin or cos
V8’s Math.sin and Rust’s f32::sin are not bit-identical on all
inputs because they use different libm implementations. The drift is
microscopic per call but accumulates across thousands of ticks.

We solve this in three layers.

7.1 Layer 1: scope the determinism

From schema/protocol.toml:

[prediction]
relevant_fields = [
"Ship.x", "Ship.y", "Ship.vx", "Ship.vy",
"Bullet.x", "Bullet.y", "Bullet.vx", "Bullet.vy",
]

Determinism is required only for fields the client predicts. Ship
position/velocity (because we predict the local ship) and bullet
spawn position/velocity (because we want the local bullet to appear
instantly when the player fires).

Everything else — enemy HP, enemy positions, asteroid positions, drop
positions — is server-authoritative and interpolated on the client.
The client doesn’t try to predict them; it just lerps between the last
two snapshots. f32 drift between JS and Rust is invisible because the
client doesn’t run those calculations.

This is the single most important determinism decision: don’t try to
make the whole simulation deterministic. Pick the smallest subset that
needs to be bit-identical, lock it down hard, and let everything else
be approximate.

7.2 Layer 2: fixed-point math for the relevant subset

The current scaffold (server/src/sim/fxp.rs) is small but illustrative:

#[derive(Serialize, Deserialize, Copy, Clone, Default, PartialEq, Eq)]
pub struct Fxp(pub i32);
const FRAC_BITS: u32 = 16;
const ONE: i32 = 1 << FRAC_BITS;
impl std::ops::Mul for Fxp {
type Output = Self;
fn mul(self, rhs: Self) -> Self {
Fxp(((self.0 as i64 * rhs.0 as i64) >> FRAC_BITS) as i32)
}
}

Fxp is I16F16: an i32 holding 16 integer bits and 16 fractional
bits. The value 1.0 is 0x00010000. To multiply, you widen to i64 (to
avoid overflow in the intermediate), multiply, then shift right by
FRAC_BITS to get back to the I16F16 representation.

I32 arithmetic is bit-identical across architectures. ARM, x86, V8
running on either — they all produce the same answer because integer
overflow and shift behavior are spec’d by IEEE-754’s complement to the
universal Two’s Complement Math, which every modern computer obeys.
There is no rounding mode to disagree about, no transcendental function
to disagree about, no NaN bit pattern.

Range: ±32,768 with 1/65,536 ≈ 15 µm precision. For a 32k-unit playfield,
that’s ample.

JS implements the same type over Int32Array slots (or, where allocation
matters, plain numbers truncated to i32 with value | 0). The
multiplication uses Math.imul to do the 32×32→32 mul, plus split-half
arithmetic to recover the high bits. About 3× slower than f32 multiply
but acceptable for the ~10 multiplies per ship per tick.

7.3 Layer 3: polynomial trig

You can’t use sin from your standard library. You also can’t avoid sin
— ship facing-to-velocity, aim-to-bullet-velocity, all of it is angles.

From schema/protocol.toml:

[trig.sin_coeffs_f64]
c0 = 1.0
c1 = -0.16666666666666666 # -1/3!
c2 = 0.008333333333333333 # 1/5!
c3 = -0.0001984126984126984 # -1/7!
c4 = 0.0000027557319223985893 # 1/9!

Both sides implement sin(x) for x ∈ [-π, π] as a degree-9 Taylor
polynomial in fixed-point arithmetic. Same coefficients, same operation
order, same intermediate truncation — same output, byte for byte.
cos(x) = sin(x + π/2), and atan2 follows the same treatment.

Cost: about 8 fixed-point multiplies per call. The hardware FPU’s sin
is much faster, but it’s faster in different ways on different machines.
For the few times per tick we need an angle, the predictable cost wins.

7.4 The seeded PRNG

Anything random in the simulation goes through a seeded PCG-64
(server/src/sim/rng.rs):

pub fn from_seed(seed: u64) -> Pcg64 { /* seed_from_u64 */ }

PCG-64 is a 128-bit-state linear congruential generator with an output
permutation. The algorithm is specified down to the multiplier
(2360ED051FC65DA44385DF649FCCF645 in hex) and the seeding procedure
(SplitMix64 to expand 64 bits to 32 bytes, then from_seed).

JS implements it with BigInt for the 128-bit state. About 5–10× slower
than Math.random — irrelevant because RNG is only called for spawn
decisions, drop rolls, asteroid splits (a handful of times per tick at
most), all of which sit outside the hot ship-prediction loop.

When a new room is created (server/src/room/mod.rs:117):

let seed: u64 = rand::random();
// ...
rng: sim_rng::from_seed(seed),

The seed is broadcast to clients in ServerMsg::RoomJoined. New
clients seed their local PRNG from the same value. From then on, both
sides produce identical RNG sequences for identical inputs — which is
the foundation of deterministic replay: same seed + same inputs →
same world state.

7.5 The discipline layer: SIM_SPEC.md

schema/SIM_SPEC.md codifies “what you cannot do in simulation code”:

  • No Math.random / Math.sin / Math.cos / Math.atan2 / Date.now /
    performance.now / setTimeout in js/sim/.
  • No rand::random / Instant::now / tokio::sleep in server/src/sim/.
  • All RNG is state.rng. All time is state.tick.
  • Anything in [prediction.relevant_fields] must be Fxp, not f32.

A planned lint step (ESLint rule for the JS side, a cargo deny style
check for Rust) enforces these structurally. For v1 it’s code review +
the parity harness. The harness is the safety net; the spec is the
upstream filter.

7.6 The parity harness

The CI step that proves the two simulations agree. From the planning
docs, the harness:

  1. Generates a fixture: seed + initial GameState + input sequence + tick
    count.
  2. Runs Rust simulate_tick over the fixture and snapshots the
    canonical state at every tick.
  3. Runs JS simulateTick over the same fixture and snapshots its state.
  4. Diffs the prediction-relevant fields. Any drift fails CI.

Fixtures live in schema/snapshots/. The Rust side has parity tests in
server/tests/parity_*.rsparity_asteroid.rs, parity_bullet.rs,
parity_collision.rs, parity_drops.rs, parity_enemy.rs,
parity_enemy_bullet.rs, parity_vectors.rs, parity_wave.rs,
pcg64_trace.rs. Each one drives a small scenario through the Rust
simulator and asserts the output matches the recorded JS-side snapshot
byte-for-byte.

This is the test suite that holds the project together. Without it,
the two simulations drift silently and you only find out months later
when a player reports “my ship lagged for a second and then teleported”.
With it, every PR that touches simulation code has to either pass the
existing fixtures or ship updated fixtures — which forces the JS and
Rust changes to land in the same commit.


8. The connection lifecycle

A WebSocket arrives. What happens?

From server/src/server/connection.rs:

8.1 Spawn the writer task

let (mut ws_tx, mut ws_rx) = ws.split();
let (out_tx, mut out_rx) = mpsc::channel::<ServerMsg>(OUTBOUND_BUFFER);
let writer = tokio::spawn(async move {
while let Some(msg) = out_rx.recv().await {
let bytes = match codec::encode_server(&msg) { ... };
if ws_tx.send(Message::Binary(bytes)).await.is_err() { break; }
}
});

The connection has two halves: read and write. We hand the write half to
a dedicated task that drains an mpsc and writes binary frames. The mpsc
sender (out_tx) is what we hand to the room — that’s how the room
“talks to” a connection without ever knowing it’s a WebSocket.

This is a tiny pattern but it has compounding benefits: rooms broadcast
to N players via N mpsc::Sender<ServerMsg> clones, never blocking on
actual network I/O. The bounded mpsc is the natural backpressure point.

8.2 Hello with timeout

let hello = match tokio::time::timeout(HELLO_TIMEOUT, read_hello(&mut ws_rx)).await {
Ok(Ok(h)) => h,
_ => { drop(out_tx); let _ = writer.await; return; }
};

Three seconds. If the client doesn’t send a Hello in that window we
drop the connection. This protects against connection-leak DoS (someone
opening sockets and never speaking).

8.3 Version check

if !protocol::is_compatible(wire_version, sim_version) {
let _ = out_tx.send(ServerMsg::Error {
code: ErrCode::Version,
msg: format!("server v{}/{}", WIRE_VERSION, SIM_VERSION),
}).await;
drop(out_tx); let _ = writer.await;
return;
}

Two version numbers travel together: WIRE_VERSION (bumped when the byte
layout changes) and SIM_VERSION (bumped when simulation rules change in
ways that affect deterministic replay). The client uses these to decide
“do I match this server or do I need to update the page?”.

Note the careful drop(out_tx); writer.await dance: we want the error
frame to flush to the client before the socket closes. Dropping the
sender ends the writer’s channel; awaiting it ensures the last frame
made it out the door.

8.4 Reattach with session UUIDs

This is the most subtle piece. From connection.rs:105:

let player_id = match hello_session.and_then(|sid| sessions.take_alive(&sid)) {
Some(SessionEntry { player_id, room, .. }) => {
if room.send(RoomInbound::Reattach {
player_id, display_name: display_name.clone(), out: out_tx.clone(),
}).await.is_ok() {
current_room = Some(room);
// counter "ok"
} else {
// counter "not ok" — fresh start
}
player_id
}
None => PlayerId::new(),
};

The flow:

  1. Client first connects → server issues session: Uuid in Welcome,
    client persists in localStorage.
  2. Client disconnects (closed laptop, fell off WiFi). Server’s room
    marks the slot as “in grace” — keeps the ship in the game world for
    60 seconds.
  3. Client reconnects → sends Hello { session: Some(uuid) }.
  4. Server looks up uuid in its SessionRegistry. If the entry is still
    alive (room still exists, grace timer hasn’t fired), the server tells
    the room to reattach the player to the same slot.
  5. Room replaces the dead out_tx with the new one, sends a fresh
    RoomJoined, clears the grace timer.

What this gives the player: briefly losing the network doesn’t lose your run. The ship freezes for a moment, then resumes. From the other
players’ point of view nothing happened — the ship just looked AFK for
a few seconds.

The implementation cost of this feature is small but the design
constraints are real. The PlayerId has to persist across the socket
death. The room has to not clean up the player’s slot on disconnect;
it converts the slot to a grace state instead:

fn handle_disconnect(&mut self, player_id: PlayerId) {
let now = crate::util::time::now_ms();
self.grace.insert(player_id, GraceTimer {
started_at_ms: now,
deadline_ms: now + self.cfg.grace_secs * 1000,
});
}

And the per-tick reap_grace() cleans up players whose grace expired:

fn reap_grace(&mut self) {
let now = crate::util::time::now_ms();
let expired: Vec<PlayerId> = self.grace.iter()
.filter(|(_, t)| now >= t.deadline_ms)
.map(|(id, _)| *id).collect();
for id in expired {
self.grace.remove(&id);
self.handle_leave(id, LeaveReason::GraceExpired);
}
}

If you build a multiplayer game today, build grace reconnect from day one. Mobile connections drop constantly; players will rage-quit a game
they think crashed when in fact they could have rejoined.

8.5 The main message loop

loop {
tokio::select! {
biased;
frame = ws_rx.next() => {
// ... decode and route ClientMsg ...
}
_ = ping.tick() => {
// ... emit periodic Ping for RTT ...
}
}
}

select! competes the inbound frame stream against a 5-second ping
timer. The ping is ServerMsg::Ping { client_t, server_t }; the client
echoes back ClientMsg::Pong { client_t, server_t } and the server
records the RTT histogram. This is also a passive liveness check — if
we can’t write Ping because the outbound queue is full, we break.

The frame router (connection.rs:166) is a pattern match on
(decoded_msg, current_room_option):

match (&msg, &current_room) {
(ClientMsg::Input { tick, packed }, Some(room)) => {
let _ = room.send(RoomInbound::Input { ... }).await;
}
(ClientMsg::Ack { snapshot_tick }, Some(room)) => { ... }
(ClientMsg::Pong { ... }, _) => { ... }
(ClientMsg::LeaveRoom, Some(room)) => { ... current_room = None }
(ClientMsg::QuickMatch, _) | (ClientMsg::BrowseRooms, _) | ... => {
if current_room.is_none() {
if let Some(handle) = mm.handle(...).await {
current_room = Some(handle);
}
}
}
_ => {}
}

A few patterns worth pointing out:

  • In-room messages need a room. Input/Ack are silently dropped
    when there’s no room. The client should not be sending them in that
    state and we don’t crash on the misbehavior.
  • Matchmaking messages need no room. You can’t QuickMatch while
    you’re already in one — the room-creation flow is gated by
    current_room.is_none().
  • LeaveRoom clears current_room on receipt, not on the room’s reply. This means subsequent inputs while waiting for the room to
    acknowledge the leave are dropped. Slightly wasteful but harmless.

8.6 Clean disconnect → session preserved

if let Some(room) = current_room.as_ref() {
let _ = room.send(RoomInbound::Disconnected { player_id }).await;
let entry = SessionEntry {
player_id,
room: room.clone(),
expires_at_ms: now_ms() + cfg.grace_secs * 1000,
};
sessions.register(session, entry);
}

When the loop ends (clean close or read error), we tell the room “this
player just disconnected” and we register the session for possible
reattach. The session entry holds a clone of the RoomHandle, so when
the client reconnects we can immediately send Reattach.


9. Matchmaking, briefly

The matchmaker is the simplest actor in the server: it owns a map of
public rooms and a few policies (Quick Match picks the smallest non-full
public room; BrowseRooms returns a Vec<RoomSummary>; CreateRoom spawns
a new Room::spawn and adds it to the registry; Join-by-Code looks up
by short code).

The interesting detail is RoomInbound::Summary:

RoomInbound::Summary { reply: oneshot::Sender<RoomSummarySnap> },

When the matchmaker assembles a RoomList, it iterates the room registry
and fires one Summary query at each room. The room replies via the
one-shot channel with a cheap { players, wave } snapshot — no
GameState cloning, no entity collection traversal. The matchmaker
joins all the replies and returns them.

This pattern — “ask the actor for a tiny snapshot via one-shot reply”
— is the canonical way to get cross-actor state in a system that owes
its sanity to “nobody else touches the state”. Don’t reach into a room
to read its wave; ask the room what its wave is.


10. The client side, in brief

The client engine work is the subject of the companion document
docs/Multiplayer Rust Client Engine – 2026-05-07.md. A compressed
summary so you can see how the two halves meet:

  • js/sim/ is the mirror of server/src/sim/. Same module names,
    same algorithms, same fixed-point math for the prediction-relevant
    fields. Used by both solo play and online prediction.
  • js/net/ws-client.js owns the WebSocket. Reconnects with
    exponential backoff. Persists the session UUID in localStorage.
  • js/net/prediction.js runs the local ship through simulateTick
    every frame using the local input. Stores a circular buffer of the
    last ~120 ticks of “(input, predicted state)” pairs.
  • When a snapshot arrives, prediction does the rollback-and-replay:
    rewind the local ship to the snapshot’s tick, snap to the server’s
    position/velocity, then replay all the inputs since that tick. If the
    prediction agreed with the server, the replay produces the same
    current state and nothing visible happens. If they disagreed, the ship
    shifts — but only by however far the prediction was wrong, which is
    usually millimeters per network blip.
  • js/net/interpolation.js lerps remote entities (other ships,
    enemies, asteroids, drops) between the two most recent snapshots.
    Render time is delayed by one snapshot interval (~50 ms) so we always
    have two snapshots to lerp between rather than extrapolating past.
  • js/net/event-firehose.js drains ServerMsg::Event frames and
    dispatches them to the existing presentation layer (audio, particles,
    damage numbers). The simulation never touches the audio manager
    directly; the event is the bridge.

The mode flag on the engine (solo vs online) is the only place that
decides “run prediction or run pure-local-sim”. Everything else — the
renderer, the audio, the input capture, the wave UI — is mode-agnostic.


11. Things we learned (and you will too)

A grab-bag of lessons that aren’t obvious from reading the code:

Pure functions buy you everything. Once simulate_tick is a pure
function of (state, inputs, dt, rng, events), you can test it, replay
it, port it, run it as a benchmark, and reason about it without
involving the entire async runtime. Every subsequent multiplayer feature
gets easier the cleaner this function is.

Make wire format boring. Bincode with with_fixint_encoding is
boring. Every length is u64, every enum tag is u32, every integer is
its declared width. Boring decoders are correct decoders.

Pick the smallest deterministic subset. Trying to make the entire
simulation deterministic across two languages is a doomed project. Pick
the few fields the client actually predicts — for us, ship and bullet
spawn positions/velocities — and lock those down with fixed-point math
and polynomial trig. Let everything else use native floats and
interpolate.

Last-write-wins is the right default. Buffering inputs creates
endless edge cases (clock drift, queue overrun, “what if the buffer
empties”). Latest input every tick is simple, correct, and good enough
for any game where players aren’t doing frame-precise inputs.

Snapshots are state; events are instants. Don’t try to send “enemy
destroyed” inside a snapshot. Snapshots are for what is in the world right now; events are for what just happened. They have different
delivery requirements (snapshots can be lossy, events cannot) and
mixing them confuses both sides.

Grace reconnect changes the player experience. Mobile network
flapping is constant. A game that survives a 30-second blackout is a
game the player keeps playing. The implementation cost is one
HashMap<PlayerId, GraceTimer> and a reap_grace() call in the tick
loop.

Versions are cheap and irreversible mistakes are expensive. Two u16s
in your Hello — wire_version, sim_version — cost you nothing to add
and save you the day a wire layout change ships before a client update
does.

Metrics from day one. rainboids_tick_duration_seconds,
rainboids_snapshot_size_bytes, rainboids_rtt_ms, rainboids_rooms_active,
rainboids_players_online. The day production has a problem you don’t
have time to instrument. Instrument now; have the dashboard waiting.

Single static binary. cargo build --release produces one ELF; you
copy it to a VPS and run it. No node_modules, no runtime version pins,
no production-only dependency failures. The boring deploy is the deploy
you can do at 2 AM.


12. What we haven’t built yet (and what it teaches)

This is a v1 plan in mid-implementation. The gaps are instructive:

  • Delta-encoded snapshots. Currently every snapshot is the full
    payload. base_tick: Option<u32> is plumbed for delta-from-acked-tick
    but unimplemented. The optimization saves ~70% of bandwidth in
    steady state.
  • bytes::BytesMut broadcast. Currently we re-encode the snapshot
    per recipient. Switching to a ref-counted Bytes allows one-encode-
    N-transmit; small N today, real win at scale.
  • Codegen for the protocol. v1 hand-mirrors the schema between
    Rust and JS. v2 reads schema/protocol.toml and emits both sides.
  • Boss fights, powerups, revive interactions. Plumbed in the
    protocol but not wired into the Rust simulation; some live only as
    parity tests against the JS reference.
  • Horizontal sharding. One process for v1. The plan calls for nginx
    in front and room_id cookie routing once we outgrow one box.

The discipline of writing down “what we’re not building yet” is itself
load-bearing. It tells you what shape your interfaces have to take so
the upgrade doesn’t break the wire — that’s why base_tick is in the
struct even though nothing reads it yet.


13. If you want to build something like this

A suggested implementation order, calibrated for one engineer working
solo with experience in JS but limited Rust:

  1. Extract simulateTick from your existing JS engine. No multiplayer
    work; just refactor until you have a pure function. This usually takes
    1–2 weeks for a real game; it pays for itself before the network
    layer is even written.
  2. Build the event queue. Particles, audio, damage numbers all move
    from inline calls to events drained by a presentation layer. Solo
    play behaves identically.
  3. Seed your RNG. Replace Math.random() with state.rng.next()
    using PCG-64. Verify solo replay still feels normal.
  4. Stand up the Rust crate. Empty simulate_tick that does nothing.
    axum WS endpoint, bincode codec, a room actor that broadcasts empty
    snapshots. Get a client connected and the wire flowing.
  5. Port one subsystem. Ship physics is the right one — small,
    self-contained, and on the prediction-relevant path. Write the parity
    test before you write the Rust code. Iterate until JS and Rust agree
    bit-for-bit on a 1,000-tick fixture.
  6. Add fixed-point math. Convert Ship.x/y/vx/vy to Fxp. Ship the
    change to the JS side simultaneously. Re-run the parity test.
  7. Port the remaining subsystems (asteroid, enemy, bullet, wave,
    drops, collision) one at a time, each with its own parity fixture.
    Order them by dependency.
  8. Wire client prediction. Read input locally, predict the local
    ship via simulateTick, reconcile against snapshots. Don’t predict
    anything else.
  9. Wire snapshot interpolation. Remote ships, enemies, asteroids
    render at now - snapshot_interval so you have two snapshots to lerp.
  10. Add grace reconnect. Session UUID, registry, room reattach path.
  11. Matchmaking, lobby UX, polish. The visible half of the project.
  12. Operationalize. Metrics dashboard, structured logs, systemd unit,
    nginx reverse proxy.

Each step has a clear success criterion. Each step is independently
testable. By the time you’re at step 12 you have a working, observable,
operable multiplayer game.


14. Closing

Authoritative-server multiplayer is a 30-year-old design. The interesting
work in 2026 isn’t reinventing it; it’s paying for the implementation
twice — once in your client language for prediction and solo play, once
in your server language for authority — while keeping them honest.

The two pieces that make our version work:

  1. A pure-function simulation that can be ported, tested, replayed,
    and parity-checked.
  2. A schema as third-party arbiter — a TOML file that neither
    language reads at runtime but both have to match, plus a CI harness
    that runs the same fixtures through both implementations and fails
    any diff.

If you want bounded tail latency, a single static binary, and CPU
headroom for years of feature growth, you pay the duplication cost and
get Rust. If you want zero duplication and a faster dev loop, you write
the server in your client language and accept the trade-offs of the
single-language path.

There is no architecture that costs nothing. The interesting question is
which costs you’d rather pay.


Further reading inside this repo:

  • docs/Multiplayer Planning – 2026-05-06.md — the original design doc.
  • docs/Multiplayer Rust Server – 2026-05-07.md — Rust server deep dive.
  • docs/Multiplayer Rust Client Engine – 2026-05-07.md — JS client engine
    refactor and prediction details.
  • docs/Multiplayer Wire Format – 2026-05-09.md — byte-level codec spec.
  • schema/SIM_SPEC.md — discipline rules for the dual simulation.
  • schema/protocol.toml — the third-party arbiter.
  • server/src/room/mod.rs — the per-room actor and tick loop.
  • server/src/server/connection.rs — the WS lifecycle, hello, reattach.
  • server/src/sim/ — authoritative simulation, side by side with js/sim/.

· · · · ·

Comments

Leave a Reply

Check also

View Archive [ -> ]

Discover more from afeique.com

Subscribe now to keep reading and get access to the full archive.

Continue reading