ADR 41: Operated PDS Hosting Topology and AT Protocol Plane Separation
Status: Proposed
Date: 2026-06-11
Last Updated: 2026-06-11 (stateless droplet; Postgres + Spaces DR; portal UI per ADR 42)
Terms (this ADR)
| ID | Term | Meaning |
|---|---|---|
| Rn | Functional requirement | Numbered obligation (R1–R12 in this ADR). |
| NRn | Non-functional requirement | Quality attribute (NR1–NR7). |
| Cn | Constraint | Non-negotiable boundary (C1–C7). |
| CCn | Cross-cutting challenge | Risk spanning components (CC1–CC5). |
| Operated PDS | Substratum-hosted PDS | Tranquil PDS at pds.substratum.cloud for Substratum login handles — distinct from customer BYO PDS. |
| Small World | AT Protocol write plane | User PDS: repo commits, OAuth, identity — hosts only that user's signed records. |
| Big World | AT Protocol read/aggregate plane | Relays, AppViews, and Substratum gateway catalog — heavy indexing and product reads. |
| Repo store (Tranquil) | Postgres tranquil_pds | Managed-cluster database (infra/data) — accounts, MST/repo metadata, OAuth state. |
| Garage phase | Product rollout band | ~0–500 Substratum-login customers (Garage v1 rollout); single operated PDS is sufficient. |
| Entryway | Multi-PDS shard router | Bluesky fleet pattern for routing handles to shard hosts — deferred for Substratum until multi-host fleet. |
Canonical product vocabulary: Glossary.
Context
Substratum operates pds.substratum.cloud for customer Substratum login (ADR 37, Business model). That host has real COGS, durability, and scaling constraints distinct from the gateway SaaS plane.
AT Protocol plane separation
Bluesky scales tens of millions of users by keeping the PDS small and pushing heavy work elsewhere:
| Layer | Bluesky @ scale | Substratum Garage v1 |
|---|---|---|
| Small World (writes) | Per-user repo on PDS hosts; commit → relay | Tranquil PDS; cloud.substratum.* repo writes via authz proxy |
| Big World (reads) | AppViews index firehose for feeds/search | Gateway Postgres catalog + blockstore/mesh for file-explorer UX |
| Blobs | Enterprise S3 | S3-compatible object storage (DO Spaces) via PDS_BLOBSTORE_S3_* |
| Social AppView | api.bsky.app | Still used for federated handle resolution (ATPROTO_APPVIEW_URL); not Substratum's product read plane |
ADR 28 already defines Substratum's two planes of truth (catalog vs mesh/PDS). This ADR names the hosting and storage decisions that keep the operated PDS in the Small World role while the gateway remains the product read plane — analogous to an AppView for storage, not a replacement for Bluesky's social AppView.
Hosting challenges
- Tranquil is the operated upstream — repo and account data in Postgres
tranquil_pdson the managed cluster; DO Spaces (S3-compatible) for blobs — same split as production Tranquil deployments. - Droplet vs durable data — disposable compute on the PDS droplet; customer repos survive droplet replace via Postgres + Spaces only.
- Garage scale does not need Bluesky fleet topology — sharding, Entryway, and bare-metal fleets are post-Garage concerns; Substratum-login users write lighter repo traffic (metadata lexicons, not full social graphs).
- Billing vs hosting — entitlement enforcement on repo mutations is ADR 37; this ADR covers where bits live and how the host survives failure.
- Cloud-agnostic posture — S3-compatible blobstore, Postgres repo store (Tranquil), and Postgres entitlements (ADR 09) on managed infrastructure.
Requirements
Functional requirements
| ID | Requirement |
|---|---|
| R1 | Garage v1 SHALL operate one Substratum PDS host at pds.substratum.cloud (reserved IP + DNS) provisioned by infra/pds — separate from marketing/app/admin droplets (ADR 36 pattern: Caddy TLS → authz proxy). |
| R2 | Operated PDS blobstore SHALL use S3-compatible object storage (PDS_BLOBSTORE_S3_*); PDS_BLOBSTORE_DISK_LOCATION MUST NOT be set in production. |
| R3 | Operated PDS repo and account data SHALL live in Postgres database tranquil_pds on the infra/data managed cluster — not on the PDS droplet root disk alone. |
| R4 | Operators SHALL rely on DO managed Postgres automatic daily backups (7-day retention) for tranquil_pds; restore procedures SHALL be in data-deployment.md and pds-deployment.md. Do not enable droplet automated backups or volume snapshot CI for Garage v1. |
| R5 | File-explorer and gateway HTTP APIs SHALL remain the primary read path for drives, listings, and uploads; clients MUST NOT depend on PDS XRPC for catalog UX (ADR 28, ADR 30). |
| R6 | Sync workers (ReceiptSyncWorker, future CatalogSyncWorker) SHALL converge catalog intent onto the operated PDS asynchronously; PDS MUST NOT block ingress on synchronous putRecord for catalog rows. |
| R7 | substratum-pds-authz-proxy SHALL sit on the PDS host in front of gated repo mutations per ADR 37; hosting topology MUST NOT require entitlement logic inside the Tranquil PDS container. |
| R8 | Entitlements, catalog, and tranquil_pds repo data SHALL use infra/data managed Postgres — not co-located ad-hoc stores on the PDS droplet. |
| R9 | PDS stack outputs (Spaces bucket, endpoint, scoped keys, data directory path) SHALL be consumable by operators via pulumi stack output without embedding secrets in cloud-init userData. |
| R10 | Local dev Compose (pds-upstream) SHALL use Tranquil PDS behind pds-authz-proxy with MinIO as S3 stand-in — dev/prod parity for blob path (ADR 13). |
| R11 | Account migration off Substratum PDS (BYO PDS, Bluesky) SHALL remain supported via MOOver / AT migration tooling (pds-account-migration.md); hosting choices MUST NOT preclude export. |
| R12 | Post-Garage multi-shard PDS (Entryway, shard registry, registration-time assignment) SHALL be a follow-on ADR or revision — not implemented during Garage v1. |
Non-functional requirements
| ID | Requirement |
|---|---|
| NR1 | Garage COGS target: default PDS droplet s-1vcpu-2gb (~$12/mo); blobs billed via Spaces; tranquil_pds on shared managed Postgres — see infra/pds/AGENTS.md sizing table. |
| NR2 | Backup RPO: DO managed Postgres daily backups for tranquil_pds (7-day retention, automatic) — document restore in ops runbook. |
| NR3 | Replaceability: droplet reprovision via pulumi up recreates cattle boot disk; redeploy Tranquil with unchanged DATABASE_URL; Postgres and Spaces hold customer data (ADR 37 C10). |
| NR4 | Region alignment: PDS droplet and Spaces bucket SHOULD share region (default nyc3) to minimize latency and egress. |
| NR5 | Security: Spaces blob bucket ACL private; PDS Spaces keys scoped readwrite to blob bucket only. |
| NR6 | PDS hosting decisions MUST remain compatible with twelve-factor IaC (ADR 13, infra/AGENTS.md): config in Pulumi stack keys, thin index.ts, secrets via pulumi config set --secret. |
| NR7 | Pulumi state discipline: production PDS infrastructure changes MUST go through pulumi (local destroy or Spindle refresh/up) — operators MUST NOT delete droplet, reserved IP, or firewall in the DigitalOcean console without matching state updates. Stack pds on substratum-pds is the authoritative resource map; drift blocks CI refresh. Greenfield teardown: local pulumi destroy, then Spindle pulumi up. See pds-deployment.md. |
Constraints
| ID | Constraint |
|---|---|
| C1 | Garage v1 MUST NOT deploy Entryway or multiple operated Tranquil backends behind one public hostname. |
| C2 | MUST NOT run two active Tranquil instances against the same DATABASE_URL (Postgres single-writer); HA is backup/restore, not active-active replicas without upstream support. |
| C3 | MUST NOT co-locate tranquil_pds on the PDS droplet disk — use managed Postgres (infra/data). |
| C4 | Gateway blockstore (ADR 08) and PDS blobstore MAY use different buckets/keys — do not conflate safety-net bytes with PDS media blobs. |
| C5 | Private cloud.substratum.* namespace policy (ADR 29) is independent of hosting — registration as PDS-private collections remains a separate workstream. |
| C6 | BYO PDS users MUST NOT be routed through Substratum operated-PDS infrastructure (ADR 37 R13). |
| C7 | Do not enable droplet automated backups or block volume snapshots for PDS DR — customer repos use Postgres tranquil_pds backups (DO managed, automatic) and Spaces blobs. |
Cross-cutting challenges
| ID | Challenge | Mitigation |
|---|---|---|
| CC1 | Tranquil optional UI cache on droplet boot disk is ephemeral | Recreate on redeploy; repo data remains in Postgres. |
| CC2 | Operators confuse gateway catalog with PDS repo source of truth | ADR 28 invariant: mesh reads follow PDS receipts; catalog may lead during sync — never treat Postgres ACL alone as mesh authority. |
| CC3 | Droplet replaced without redeploying Tranquil + authz proxy | Spindle deploy-pds.sh redeploys apps from Pulumi outputs after infra up; verify-pds-live.sh gates live traffic. |
| CC4 | Scale past single-host Tranquil limits | Monitor Postgres size, droplet RAM, and Spaces usage; post-Garage: multi-host fleet + Entryway-style router (R12). |
| CC5 | Secrets (PDS_JWT_SECRET, PLC rotation key) lost on rebuild | Store in operator secret manager; document rotation in runbook. |
Decision
1. Substratum mapping to AT Protocol planes
| Concern | Owner | Garage v1 store |
|---|---|---|
| Drive listing, uploads, shares | Gateway | Managed Postgres + gateway blockstore |
| Mesh ACL / provenance | Owner PDS repo | Postgres tranquil_pds |
| Media attached to PDS records | PDS blobstore | DO Spaces |
| Login / OAuth / session JWT | Tranquil PDS upstream; branded UI on apps/pds-portal (ADR 42) (+ gateway session cookie on app.*) | Postgres tranquil_pds |
| Paid metadata writes | Authz proxy + entitlements | Postgres (ADR 37) |
The gateway is Substratum's AppView analog for the storage product — not a replacement for api.bsky.app social feeds.
2. Garage v1 storage layout (operated PDS)
| Data class | Location | Survives droplet replace? |
|---|---|---|
| Blobs | Spaces (substratum-pds-blobs default) | Yes |
| Repo + accounts | Postgres tranquil_pds (managed cluster) | Yes |
| Entitlements / catalog | infra/data Postgres substratum | Yes |
| Tranquil host cache (fallback UI) | Ephemeral boot disk → /var/lib/substratum/tranquil-pds | No — redeploy Tranquil; superseded by portal for Phase 1 paths (ADR 42) |
| TLS / edge | Caddy on droplet | Recreated by cloud-init |
Backup policy (Garage): Postgres tranquil_pds only — DO managed cluster automatic daily backups (7-day retention). No droplet automated backups, no block volumes, no volume snapshot CI. CI deploy: .tangled/workflows/pds.yml runs deploy-pds.sh — refresh → up → verify. Droplet rebuild: cloud-init + redeploy Tranquil with the same DATABASE_URL. State: CI assumes Pulumi state matches DO (NR7).
3. Scaling posture
| Phase | Users (order of magnitude) | PDS topology |
|---|---|---|
| Garage v1 | 0–500 | Single pds.substratum.cloud, Spaces blobs, Postgres repo store |
| Growth | 10³–10⁵ | Vertical scale droplet; tune Postgres sizing |
| Post-Garage | 10⁵+ | Multi-shard fleet, Entryway-style router, registration-time shard assignment — new ADR |
Bluesky's per-host SQLite sharding model is not Substratum's Garage v1 path; we operate Tranquil with Postgres repo storage and defer multi-host fleet routing until post-Garage scale.
4. Implementation ownership
| Artifact | Location |
|---|---|
| Pulumi stack | infra/pds/ |
| CI deploy | scripts/ci/deploy-pds.sh — refresh → up → app deploy from data/pds outputs |
| CI verify (infra) | scripts/ci/verify-pds.sh — mid-pipeline; 502 OK before app step |
| CI verify (live) | scripts/ci/verify-pds-live.sh — Tranquil + authz + HTTPS 200 |
| Ops runbook | pds-deployment.md |
| Rollout sequence | garage-v1-rollout.md Phase 3 |
| Billing edge | ADR 37 — out of scope for this ADR beyond host placement |
Rejected alternatives
| Alternative | Why rejected |
|---|---|
| Reference Bluesky PDS (SQLite per DID) | No Postgres repo store; per-DID SQLite on volume conflicts with Tranquil-operated model and managed backup story. |
| Block volume + volume snapshots | Extra cost; customer state is Postgres + Spaces; droplet is cattle. |
| Repo data on droplet disk | Lost on droplet replace; repos belong in tranquil_pds. |
| All PDS data on block volume including blobs | Duplicates S3 value prop; volume cost grows with media; blobs belong in Spaces (R2). |
| Entryway + multi-PDS for Garage | Operational overkill for 0–500 users; deferred (R12, C1). |
| Gateway-as-PDS (no operated PDS) | Conflicts with Substratum login product and ADR 27/37 operated-host requirements. |
Consequences
Positive
- Clear separation: gateway scales UX, PDS scales identity/commits, Spaces scales bytes.
- Garage COGS stay predictable (small droplet + usage-based Spaces).
- Aligns with Bluesky's proven Small World / Big World split without premature fleet complexity.
- Postgres + Spaces give a clear DR story without volume snapshot ops burden.
Negative
- PDS droplet is fully cattle — operators must redeploy Tranquil + authz proxy after replace and manage Postgres restore drills.
- Single PDS is a blast-radius unit until sharding ships.
- Block volume adds ~$2.50/mo vs root-disk-only (acceptable for durability).
Neutral
- Federated users still resolve via public AppView; this ADR does not change BYO PDS paths.
- Authz proxy and entitlement schema unchanged — see ADR 37.
Verification
| Scenario | Expected |
|---|---|
pulumi refresh / pulumi up on infra/pds | Reconcile or update stack — state must match DO; routine path via Spindle |
pulumi destroy (local) + Spindle up | Greenfield reprovision — production destructive |
| SSH smoke (CI infra) | verify-pds.sh: Docker + Caddy; Tranquil deploy docs path |
| App live (post-deploy) | verify-pds-live.sh: Tranquil _health, authz /health, public HTTPS 200 |
| Blob upload on PDS | Object appears in Spaces bucket |
| Simulated droplet replace | pulumi up → redeploy Tranquil with same DATABASE_URL → existing DIDs; blobs still in Spaces |
| File explorer browse | Reads gateway API / Postgres — not PDS listRecords for drive tree |
Related
- Glossary
- ADR 08: Hosting and Frontend Stack — gateway blockstore S3
- ADR 09: Database and ORM — entitlements Postgres
- ADR 13: Twelve-Factor App
- ADR 27: Zero Trust PDS-Based Provenance
- ADR 28: Receipt Sync Queue — catalog vs mesh planes
- ADR 29: Private PDS Namespace
- ADR 30: Catalog–PDS Dual-Write
- ADR 36: Marketing Landing Page — DO droplet pattern
- ADR 37: PDS Entitlement Proxy — billing edge on PDS host
- ADR 42: Branded PDS Portal —
apps/pds-portalidentity UI onpds.* - Garage v1 rollout
- PDS deployment runbook
infra/pds/AGENTS.md- AT Protocol federation overview
- Bluesky Entryway guide