Skip to content

ADR 41: Operated PDS Hosting Topology and AT Protocol Plane Separation

Status: Proposed
Date: 2026-06-11
Last Updated: 2026-06-11 (stateless droplet; Postgres + Spaces DR; portal UI per ADR 42)

Terms (this ADR)

IDTermMeaning
RnFunctional requirementNumbered obligation (R1–R12 in this ADR).
NRnNon-functional requirementQuality attribute (NR1–NR7).
CnConstraintNon-negotiable boundary (C1–C7).
CCnCross-cutting challengeRisk spanning components (CC1–CC5).
Operated PDSSubstratum-hosted PDSTranquil PDS at pds.substratum.cloud for Substratum login handles — distinct from customer BYO PDS.
Small WorldAT Protocol write planeUser PDS: repo commits, OAuth, identity — hosts only that user's signed records.
Big WorldAT Protocol read/aggregate planeRelays, AppViews, and Substratum gateway catalog — heavy indexing and product reads.
Repo store (Tranquil)Postgres tranquil_pdsManaged-cluster database (infra/data) — accounts, MST/repo metadata, OAuth state.
Garage phaseProduct rollout band~0–500 Substratum-login customers (Garage v1 rollout); single operated PDS is sufficient.
EntrywayMulti-PDS shard routerBluesky fleet pattern for routing handles to shard hosts — deferred for Substratum until multi-host fleet.

Canonical product vocabulary: Glossary.

Context

Substratum operates pds.substratum.cloud for customer Substratum login (ADR 37, Business model). That host has real COGS, durability, and scaling constraints distinct from the gateway SaaS plane.

AT Protocol plane separation

Bluesky scales tens of millions of users by keeping the PDS small and pushing heavy work elsewhere:

LayerBluesky @ scaleSubstratum Garage v1
Small World (writes)Per-user repo on PDS hosts; commit → relayTranquil PDS; cloud.substratum.* repo writes via authz proxy
Big World (reads)AppViews index firehose for feeds/searchGateway Postgres catalog + blockstore/mesh for file-explorer UX
BlobsEnterprise S3S3-compatible object storage (DO Spaces) via PDS_BLOBSTORE_S3_*
Social AppViewapi.bsky.appStill used for federated handle resolution (ATPROTO_APPVIEW_URL); not Substratum's product read plane

ADR 28 already defines Substratum's two planes of truth (catalog vs mesh/PDS). This ADR names the hosting and storage decisions that keep the operated PDS in the Small World role while the gateway remains the product read plane — analogous to an AppView for storage, not a replacement for Bluesky's social AppView.

Hosting challenges

  1. Tranquil is the operated upstream — repo and account data in Postgres tranquil_pds on the managed cluster; DO Spaces (S3-compatible) for blobs — same split as production Tranquil deployments.
  2. Droplet vs durable data — disposable compute on the PDS droplet; customer repos survive droplet replace via Postgres + Spaces only.
  3. Garage scale does not need Bluesky fleet topology — sharding, Entryway, and bare-metal fleets are post-Garage concerns; Substratum-login users write lighter repo traffic (metadata lexicons, not full social graphs).
  4. Billing vs hosting — entitlement enforcement on repo mutations is ADR 37; this ADR covers where bits live and how the host survives failure.
  5. Cloud-agnostic postureS3-compatible blobstore, Postgres repo store (Tranquil), and Postgres entitlements (ADR 09) on managed infrastructure.

Requirements

Functional requirements

IDRequirement
R1Garage v1 SHALL operate one Substratum PDS host at pds.substratum.cloud (reserved IP + DNS) provisioned by infra/pds — separate from marketing/app/admin droplets (ADR 36 pattern: Caddy TLS → authz proxy).
R2Operated PDS blobstore SHALL use S3-compatible object storage (PDS_BLOBSTORE_S3_*); PDS_BLOBSTORE_DISK_LOCATION MUST NOT be set in production.
R3Operated PDS repo and account data SHALL live in Postgres database tranquil_pds on the infra/data managed cluster — not on the PDS droplet root disk alone.
R4Operators SHALL rely on DO managed Postgres automatic daily backups (7-day retention) for tranquil_pds; restore procedures SHALL be in data-deployment.md and pds-deployment.md. Do not enable droplet automated backups or volume snapshot CI for Garage v1.
R5File-explorer and gateway HTTP APIs SHALL remain the primary read path for drives, listings, and uploads; clients MUST NOT depend on PDS XRPC for catalog UX (ADR 28, ADR 30).
R6Sync workers (ReceiptSyncWorker, future CatalogSyncWorker) SHALL converge catalog intent onto the operated PDS asynchronously; PDS MUST NOT block ingress on synchronous putRecord for catalog rows.
R7substratum-pds-authz-proxy SHALL sit on the PDS host in front of gated repo mutations per ADR 37; hosting topology MUST NOT require entitlement logic inside the Tranquil PDS container.
R8Entitlements, catalog, and tranquil_pds repo data SHALL use infra/data managed Postgres — not co-located ad-hoc stores on the PDS droplet.
R9PDS stack outputs (Spaces bucket, endpoint, scoped keys, data directory path) SHALL be consumable by operators via pulumi stack output without embedding secrets in cloud-init userData.
R10Local dev Compose (pds-upstream) SHALL use Tranquil PDS behind pds-authz-proxy with MinIO as S3 stand-in — dev/prod parity for blob path (ADR 13).
R11Account migration off Substratum PDS (BYO PDS, Bluesky) SHALL remain supported via MOOver / AT migration tooling (pds-account-migration.md); hosting choices MUST NOT preclude export.
R12Post-Garage multi-shard PDS (Entryway, shard registry, registration-time assignment) SHALL be a follow-on ADR or revision — not implemented during Garage v1.

Non-functional requirements

IDRequirement
NR1Garage COGS target: default PDS droplet s-1vcpu-2gb (~$12/mo); blobs billed via Spaces; tranquil_pds on shared managed Postgres — see infra/pds/AGENTS.md sizing table.
NR2Backup RPO: DO managed Postgres daily backups for tranquil_pds (7-day retention, automatic) — document restore in ops runbook.
NR3Replaceability: droplet reprovision via pulumi up recreates cattle boot disk; redeploy Tranquil with unchanged DATABASE_URL; Postgres and Spaces hold customer data (ADR 37 C10).
NR4Region alignment: PDS droplet and Spaces bucket SHOULD share region (default nyc3) to minimize latency and egress.
NR5Security: Spaces blob bucket ACL private; PDS Spaces keys scoped readwrite to blob bucket only.
NR6PDS hosting decisions MUST remain compatible with twelve-factor IaC (ADR 13, infra/AGENTS.md): config in Pulumi stack keys, thin index.ts, secrets via pulumi config set --secret.
NR7Pulumi state discipline: production PDS infrastructure changes MUST go through pulumi (local destroy or Spindle refresh/up) — operators MUST NOT delete droplet, reserved IP, or firewall in the DigitalOcean console without matching state updates. Stack pds on substratum-pds is the authoritative resource map; drift blocks CI refresh. Greenfield teardown: local pulumi destroy, then Spindle pulumi up. See pds-deployment.md.

Constraints

IDConstraint
C1Garage v1 MUST NOT deploy Entryway or multiple operated Tranquil backends behind one public hostname.
C2MUST NOT run two active Tranquil instances against the same DATABASE_URL (Postgres single-writer); HA is backup/restore, not active-active replicas without upstream support.
C3MUST NOT co-locate tranquil_pds on the PDS droplet disk — use managed Postgres (infra/data).
C4Gateway blockstore (ADR 08) and PDS blobstore MAY use different buckets/keys — do not conflate safety-net bytes with PDS media blobs.
C5Private cloud.substratum.* namespace policy (ADR 29) is independent of hosting — registration as PDS-private collections remains a separate workstream.
C6BYO PDS users MUST NOT be routed through Substratum operated-PDS infrastructure (ADR 37 R13).
C7Do not enable droplet automated backups or block volume snapshots for PDS DR — customer repos use Postgres tranquil_pds backups (DO managed, automatic) and Spaces blobs.

Cross-cutting challenges

IDChallengeMitigation
CC1Tranquil optional UI cache on droplet boot disk is ephemeralRecreate on redeploy; repo data remains in Postgres.
CC2Operators confuse gateway catalog with PDS repo source of truthADR 28 invariant: mesh reads follow PDS receipts; catalog may lead during sync — never treat Postgres ACL alone as mesh authority.
CC3Droplet replaced without redeploying Tranquil + authz proxySpindle deploy-pds.sh redeploys apps from Pulumi outputs after infra up; verify-pds-live.sh gates live traffic.
CC4Scale past single-host Tranquil limitsMonitor Postgres size, droplet RAM, and Spaces usage; post-Garage: multi-host fleet + Entryway-style router (R12).
CC5Secrets (PDS_JWT_SECRET, PLC rotation key) lost on rebuildStore in operator secret manager; document rotation in runbook.

Decision

1. Substratum mapping to AT Protocol planes

ConcernOwnerGarage v1 store
Drive listing, uploads, sharesGatewayManaged Postgres + gateway blockstore
Mesh ACL / provenanceOwner PDS repoPostgres tranquil_pds
Media attached to PDS recordsPDS blobstoreDO Spaces
Login / OAuth / session JWTTranquil PDS upstream; branded UI on apps/pds-portal (ADR 42) (+ gateway session cookie on app.*)Postgres tranquil_pds
Paid metadata writesAuthz proxy + entitlementsPostgres (ADR 37)

The gateway is Substratum's AppView analog for the storage product — not a replacement for api.bsky.app social feeds.

2. Garage v1 storage layout (operated PDS)

Data classLocationSurvives droplet replace?
BlobsSpaces (substratum-pds-blobs default)Yes
Repo + accountsPostgres tranquil_pds (managed cluster)Yes
Entitlements / cataloginfra/data Postgres substratumYes
Tranquil host cache (fallback UI)Ephemeral boot disk → /var/lib/substratum/tranquil-pdsNo — redeploy Tranquil; superseded by portal for Phase 1 paths (ADR 42)
TLS / edgeCaddy on dropletRecreated by cloud-init

Backup policy (Garage): Postgres tranquil_pds only — DO managed cluster automatic daily backups (7-day retention). No droplet automated backups, no block volumes, no volume snapshot CI. CI deploy: .tangled/workflows/pds.yml runs deploy-pds.shrefreshup → verify. Droplet rebuild: cloud-init + redeploy Tranquil with the same DATABASE_URL. State: CI assumes Pulumi state matches DO (NR7).

3. Scaling posture

PhaseUsers (order of magnitude)PDS topology
Garage v10–500Single pds.substratum.cloud, Spaces blobs, Postgres repo store
Growth10³–10⁵Vertical scale droplet; tune Postgres sizing
Post-Garage10⁵+Multi-shard fleet, Entryway-style router, registration-time shard assignment — new ADR

Bluesky's per-host SQLite sharding model is not Substratum's Garage v1 path; we operate Tranquil with Postgres repo storage and defer multi-host fleet routing until post-Garage scale.

4. Implementation ownership

ArtifactLocation
Pulumi stackinfra/pds/
CI deployscripts/ci/deploy-pds.sh — refresh → up → app deploy from data/pds outputs
CI verify (infra)scripts/ci/verify-pds.sh — mid-pipeline; 502 OK before app step
CI verify (live)scripts/ci/verify-pds-live.sh — Tranquil + authz + HTTPS 200
Ops runbookpds-deployment.md
Rollout sequencegarage-v1-rollout.md Phase 3
Billing edgeADR 37out of scope for this ADR beyond host placement

Rejected alternatives

AlternativeWhy rejected
Reference Bluesky PDS (SQLite per DID)No Postgres repo store; per-DID SQLite on volume conflicts with Tranquil-operated model and managed backup story.
Block volume + volume snapshotsExtra cost; customer state is Postgres + Spaces; droplet is cattle.
Repo data on droplet diskLost on droplet replace; repos belong in tranquil_pds.
All PDS data on block volume including blobsDuplicates S3 value prop; volume cost grows with media; blobs belong in Spaces (R2).
Entryway + multi-PDS for GarageOperational overkill for 0–500 users; deferred (R12, C1).
Gateway-as-PDS (no operated PDS)Conflicts with Substratum login product and ADR 27/37 operated-host requirements.

Consequences

Positive

  • Clear separation: gateway scales UX, PDS scales identity/commits, Spaces scales bytes.
  • Garage COGS stay predictable (small droplet + usage-based Spaces).
  • Aligns with Bluesky's proven Small World / Big World split without premature fleet complexity.
  • Postgres + Spaces give a clear DR story without volume snapshot ops burden.

Negative

  • PDS droplet is fully cattle — operators must redeploy Tranquil + authz proxy after replace and manage Postgres restore drills.
  • Single PDS is a blast-radius unit until sharding ships.
  • Block volume adds ~$2.50/mo vs root-disk-only (acceptable for durability).

Neutral

  • Federated users still resolve via public AppView; this ADR does not change BYO PDS paths.
  • Authz proxy and entitlement schema unchanged — see ADR 37.

Verification

ScenarioExpected
pulumi refresh / pulumi up on infra/pdsReconcile or update stack — state must match DO; routine path via Spindle
pulumi destroy (local) + Spindle upGreenfield reprovision — production destructive
SSH smoke (CI infra)verify-pds.sh: Docker + Caddy; Tranquil deploy docs path
App live (post-deploy)verify-pds-live.sh: Tranquil _health, authz /health, public HTTPS 200
Blob upload on PDSObject appears in Spaces bucket
Simulated droplet replacepulumi up → redeploy Tranquil with same DATABASE_URL → existing DIDs; blobs still in Spaces
File explorer browseReads gateway API / Postgres — not PDS listRecords for drive tree