ADR 17: Multi-Tenant PSK Injection via Secure WebSockets (WSS)
Status: Accepted Date: 2026-04-24
Context
Substratum's architecture relies on absolute data sovereignty via a private, identity-gated swarm (see ADR 03: Network Behavior). Every user is assigned a unique 256-bit Swarm PSK derived from their User_DID. The Sidecars (local agents) must connect to the Global Triangle nodes (Gateway) using this specific PSK to decrypt the libp2p-pnet handshake.
When the Gateway operates as a multi-tenant node, it faces a fundamental handshake paradox over raw TCP: it doesn't know which user is connecting, so it doesn't know which PSK to use to decrypt the initial connection.
Options like spawning a separate libp2p Swarm on a unique port for every user are operationally expensive (exhausting cloud load balancer rules, memory, and file descriptors). Options like sending the User_DID in plaintext over a raw TCP socket leak identity to network observers.
We need a scalable, single-swarm solution that maintains the "two-lock" security model (Network Isolation + Identity-Gated Access) without breaking standard libp2p protocols or leaking metadata.
Decision
We will use Secure WebSockets (wss://) as the underlying transport layer for libp2p connections between Sidecars and the Gateway, using an HTTP Upgrade path to dynamically inject the user-specific PSK.
- The TLS Wrapper (Lock 1): Sidecars initiate a standard HTTPS connection to a user-specific endpoint:
wss://gateway.substratum.cloud/swarm/{User_DID}. The connection is secured by standard TLS on port 443, hiding theUser_DIDfrom network observers. - The HTTP Upgrade (Axum): The Gateway's Axum web server intercepts the request. It extracts the
{User_DID}from the path and derives the expectedSwarm_PSK = SHA256(Master_Secret + User_DID). - The PSK Injection (Lock 2): Axum accepts the WebSocket upgrade. Before handing the raw WebSocket stream over to the central
rust-libp2pSwarm, it wraps the stream in alibp2p-pnetlayer initialized with that specific user'sSwarm_PSK. - The Single Swarm: The Gateway runs a single, highly efficient
libp2pSwarm. By the time a connection reaches the Swarm, it has already been individually authenticated and decrypted at the WebSocket boundary.
Consequences
Positive
- Infrastructure Scalability: All traffic flows over standard port 443. We can use standard cloud load balancers (ALB/HTTPS) without complex NLB or dynamic port provisioning.
- Resource Efficiency: We run a single
libp2pSwarm, minimizing memory, thread usage, and file descriptors compared to spinning up thousands of isolated Swarms. - Privacy: The
User_DIDis never sent in plaintext. It is protected by the outer TLS layer. - Protocol Compliance:
rust-libp2pnatively supports WebSocket transports. We do not need to fork or invent custom cryptographic handshake protocols; we just compose standard TLS, WSS, andpnet.
Negative
- WebSocket Overhead: WebSockets introduce a small framing overhead compared to raw TCP sockets, slightly increasing bandwidth usage for Bitswap block transfers.
- Proxy Timeouts: Some restrictive corporate firewalls or aggressive load balancers may drop long-lived WebSocket connections. We will need robust
Pingand keep-alive behaviors in theSubstratumBehaviour.