Skip to content

ADR 17: Multi-Tenant PSK Injection via Secure WebSockets (WSS)

Status: Accepted Date: 2026-04-24

Context

Substratum's architecture relies on absolute data sovereignty via a private, identity-gated swarm (see ADR 03: Network Behavior). Every user is assigned a unique 256-bit Swarm PSK derived from their User_DID. The Sidecars (local agents) must connect to the Global Triangle nodes (Gateway) using this specific PSK to decrypt the libp2p-pnet handshake.

When the Gateway operates as a multi-tenant node, it faces a fundamental handshake paradox over raw TCP: it doesn't know which user is connecting, so it doesn't know which PSK to use to decrypt the initial connection.

Options like spawning a separate libp2p Swarm on a unique port for every user are operationally expensive (exhausting cloud load balancer rules, memory, and file descriptors). Options like sending the User_DID in plaintext over a raw TCP socket leak identity to network observers.

We need a scalable, single-swarm solution that maintains the "two-lock" security model (Network Isolation + Identity-Gated Access) without breaking standard libp2p protocols or leaking metadata.

Decision

We will use Secure WebSockets (wss://) as the underlying transport layer for libp2p connections between Sidecars and the Gateway, using an HTTP Upgrade path to dynamically inject the user-specific PSK.

  1. The TLS Wrapper (Lock 1): Sidecars initiate a standard HTTPS connection to a user-specific endpoint: wss://gateway.substratum.cloud/swarm/{User_DID}. The connection is secured by standard TLS on port 443, hiding the User_DID from network observers.
  2. The HTTP Upgrade (Axum): The Gateway's Axum web server intercepts the request. It extracts the {User_DID} from the path and derives the expected Swarm_PSK = SHA256(Master_Secret + User_DID).
  3. The PSK Injection (Lock 2): Axum accepts the WebSocket upgrade. Before handing the raw WebSocket stream over to the central rust-libp2p Swarm, it wraps the stream in a libp2p-pnet layer initialized with that specific user's Swarm_PSK.
  4. The Single Swarm: The Gateway runs a single, highly efficient libp2p Swarm. By the time a connection reaches the Swarm, it has already been individually authenticated and decrypted at the WebSocket boundary.

Consequences

Positive

  • Infrastructure Scalability: All traffic flows over standard port 443. We can use standard cloud load balancers (ALB/HTTPS) without complex NLB or dynamic port provisioning.
  • Resource Efficiency: We run a single libp2p Swarm, minimizing memory, thread usage, and file descriptors compared to spinning up thousands of isolated Swarms.
  • Privacy: The User_DID is never sent in plaintext. It is protected by the outer TLS layer.
  • Protocol Compliance: rust-libp2p natively supports WebSocket transports. We do not need to fork or invent custom cryptographic handshake protocols; we just compose standard TLS, WSS, and pnet.

Negative

  • WebSocket Overhead: WebSockets introduce a small framing overhead compared to raw TCP sockets, slightly increasing bandwidth usage for Bitswap block transfers.
  • Proxy Timeouts: Some restrictive corporate firewalls or aggressive load balancers may drop long-lived WebSocket connections. We will need robust Ping and keep-alive behaviors in the SubstratumBehaviour.