Substratum-operated PDS deployment
Last Updated: 2026-06-11 (stateless droplet; Postgres + Spaces DR only)
Runbook for pds.substratum.cloud: Tranquil PDS upstream (recommended), PDS repo authz proxy, and TLS edge (Garage Phase 3). Design: ADR 37 (billing edge), ADR 41 (hosting topology).
Topology
| Component | Role |
|---|---|
| Caddy | TLS termination; reverse_proxy to authz proxy on loopback :3000 |
substratum-pds-authz-proxy | Gates cloud.substratum.* createRecord / putRecord; allows deleteRecord when lapsed |
| Tranquil PDS | AT Protocol upstream on 127.0.0.1:2583 — not on public :443 without proxy |
| DO Spaces | S3-compatible blobstore (S3_BUCKET / Spaces outputs from Pulumi) |
Postgres substratum | Entitlement lookups (DATABASE_URL for authz proxy) |
Postgres tranquil_pds | Tranquil repo + account data (separate database on managed cluster) |
apps/pds-portal | Branded identity SPA static root (/var/www/pds-portal prod; dist/apps/pds-portal Compose) |
Caddy path routing (portal + proxy)
| Path | Handler |
|---|---|
/xrpc/*, /.well-known/*, /oauth/*, /health | substratum-pds-authz-proxy → Tranquil |
/app/* | Portal SPA (try_files → index.html) |
/ | Portal SPA fallback |
See apps/pds-portal/AGENTS.md and docker/tranquil-pds/Caddyfile.
License: Tranquil PDS is AGPL-3.0. Review AGPL obligations before operating a production instance.
Local dev (Compose)
docker compose --profile gateway up -d --build| Service | Port | Notes |
|---|---|---|
pds-authz-proxy | localhost:3000 | Public PDS URL for OAuth UI |
pds-upstream | internal :2583 | Tranquil PDS from atcr.io (ATCR login in .env) |
Set PDS_AUTHZ_GATE_ENABLED=false only when debugging proxy logic without entitlement rows.
Production infra (infra/pds)
Environment: pds.substratum.cloud is production. Customer repo and account data live in Postgres tranquil_pds (managed cluster); media blobs in DO Spaces. Treat Pulumi state, managed Postgres, and Spaces as production data — not disposable dev sandboxes.
Per ADR 41: Tranquil PDS upstream; Spaces for blobs; Postgres tranquil_pds for repo/account state (DO managed-cluster daily backups, 7-day retention). The PDS droplet is cattle — no block volume, no droplet automated backups, no volume snapshot CI. Garage v1: single pds.substratum.cloud host.
Pulumi state discipline (production)
Pulumi Cloud stack straiforos-org/substratum-pds/pds is the authoritative map between resource URNs and live DigitalOcean objects (droplet, reserved IP, firewall, DNS, Spaces keys). CI and operators must keep that map aligned with reality.
| Rule | Why |
|---|---|
Never delete PDS droplet, reserved IP, or firewall in the DO console without a matching pulumi destroy or pulumi state delete on an operator workstation | Console-only deletes cause state drift — pulumi refresh fails (404 on reserved IP), Spindle deploys stall |
Routine changes go through Spindle — push to main or trigger Deploy PDS (infra + apps) | deploy-pds.sh runs refresh → up → verify → app deploy → live verify |
pulumi refresh reconciles drift when resources still exist in DO but differ from state | It does not repair state after console deletes — fix drift with pulumi destroy locally or explicit pulumi state delete (last resort, ops log required) |
| Do not enable DO droplet automated backups or pay for volume snapshots | Customer repos live in Postgres + Spaces only; boot disk is reprovisioned by cloud-init |
| One operator drives destructive stack ops — announce in ops channel, record stack name, reason, and timestamp | Avoid concurrent local pulumi up while Spindle runs deploy-pds.sh |
Approved change classes
| Goal | Path |
|---|---|
| Config / cloud-init / sizing tweak | Push → Spindle pds.yml (or local pulumi up only when coordinating with CI) |
| Droplet replace | Spindle or pulumi up — redeploy Tranquil + authz proxy; Postgres and Spaces unchanged |
| Full greenfield (retire host) | Greenfield reprovision — local pulumi destroy, then push for Spindle up |
| Emergency state repair | pulumi state delete for specific URNs — break-glass only; document before and after in ops log |
Stack outputs are operational contracts. dropletIp, pdsSshPrivateKey, and Spaces credentials in outputs must match live DO before SSH or manual deploy steps.
Durable state and backups
| Data | Where | Backup |
|---|---|---|
| Repos / accounts | Postgres tranquil_pds (infra/data) | DO managed DB — daily, 7-day retention (automatic) |
| Media blobs | DO Spaces | Bucket in same DO project; lifecycle policies optional |
| Droplet boot disk | Ephemeral | None — cloud-init + runbook redeploy |
See Data deployment for Postgres stack outputs. Restore Tranquil from tranquilPdsUri after any droplet replace.
Droplet rebuild
When CI or pulumi up replaces the droplet, cloud-init reprovisions Caddy and Docker on a fresh VM. Postgres tranquil_pds and PDS blobs in Spaces are unaffected. Spindle deploy-pds.sh redeploys Tranquil and the authz proxy automatically from Pulumi outputs.
Greenfield reprovision
Production destructive ritual — use when retiring the PDS host. Postgres tranquil_pds and Spaces blobs are independent of this stack — destroy only drops droplet/DNS/Spaces keys infra unless you also wipe the data stack or bucket.
- Clear legacy stack keys if present:
cd infra/pds
pulumi stack select pds
pulumi config rm dataDirectory --yes 2>/dev/null || true
pulumi config rm volumeSizeGb --yes 2>/dev/null || true
pulumi config rm volumeName --yes 2>/dev/null || true
pulumi config rm pdsDeployPrivateKey --yes 2>/dev/null || true
pulumi config rm sshPublicKeys --yes 2>/dev/null || true- Destroy the stack locally:
pulumi destroy --yesRequires PULUMI_ACCESS_TOKEN, DIGITALOCEAN_TOKEN, and the same stack config CI uses. Do not delete droplet/IP in the DO console instead of pulumi destroy.
If pulumi destroy fails because resources were already removed in the console (state drift), repair state on the operator workstation before retrying. Record every pulumi state delete in the ops log.
Push to
main(or trigger Spindle Deploy PDS infra). CI runsdeploy-pds.sh:pulumi refresh→pulumi up→ verify.After CI succeeds,
verify-pds-live.shconfirms Tranquil, authz proxy, and public HTTPS 200 — no manual app deploy required when Spindle secrets are set.Legacy cleanup: delete orphaned
pds-datablock volume and any old volume snapshots in the DO console if they remain from pre–stateless stacks.
PDS blobs in Spaces survive stack destroy; wipe the bucket separately in DO only when intentionally resetting blob storage.
CI (Spindle)
.tangled/workflows/pds.yml — on main push (or manual):
deploy-pds.sh— refresh →pulumi up→ infra verify → Tranquil DB grants →deploy-pds-app.sh→verify-pds-live.sh
Full stack deploy. CI provisions the droplet (Caddy TLS, Docker, DNS), reads infra/data outputs (tranquilPdsUri, databaseUri) and infra/pds outputs (Spaces keys, pdsJwtSecret, domain), renders env files on the droplet, deploys Tranquil (Docker) + authz proxy (systemd), and asserts live health.
| Script | When | Pass criteria |
|---|---|---|
verify-pds.sh | Mid-pipeline (before app deploy) | Infra bootstrap; public HTTPS any status (502 OK until app step) |
deploy-pds-app.sh | Every deploy-pds.sh | Renders env from Pulumi; Tranquil :2583 + authz :3000 healthy on droplet |
verify-pds-live.sh | End of deploy-pds.sh | Tranquil :2583/xrpc/_health, authz :3000/health, portal / HTML, OAuth metadata JSON, public HTTPS 200 |
CI assumes Pulumi state matches DigitalOcean. Console-only deletes block refresh until an operator repairs state locally (see Pulumi state discipline).
Spindle secrets: PULUMI_ACCESS_TOKEN, DIGITALOCEAN_TOKEN, ATCR_HANDLE, ATCR_APP_PASSWORD, optional PDS_JWT_SECRET (or set pdsJwtSecret on the PDS Pulumi stack), optional DIGITALOCEAN_PROJECT_ID, PDS_DOMAIN (default pds.substratum.cloud), PDS_SSH_HOST.
Deploy SSH key comes from Pulumi stack output pdsSshPrivateKey — not a Spindle secret.
Prerequisites: infra/data stack applied (tranquilPdsUri, databaseUri outputs). First deploy: set pdsJwtSecret once (pulumi config set pdsJwtSecret "$(openssl rand -base64 48)" --secret in infra/pds) or provide PDS_JWT_SECRET in Spindle.
Operator reference (manual / debug)
Use these when debugging outside CI or when Spindle cannot reach managed Postgres for grants.
Tranquil Postgres grants
bootstrap-tranquil-pds-db.sh runs automatically in deploy-pds.sh. If the runner is firewall-blocked, SSH to a droplet tagged substratum and run it manually.
Re-deploy apps only
export PULUMI_ACCESS_TOKEN=… DIGITALOCEAN_TOKEN=… ATCR_HANDLE=… ATCR_APP_PASSWORD=…
./scripts/ci/deploy-pds-app.sh
./scripts/ci/verify-pds-live.shExample env files
infra/pds/tranquil-pds.env.example documents the shape of /etc/tranquil-pds/tranquil.env (CI renders this from stack outputs). Authz proxy env: /etc/substratum/pds-authz-proxy.env (DATABASE_URL = entitlements databaseUri from data stack).
| File | Purpose |
|---|---|
docker-compose.production.yml | Tranquil upstream on loopback :2583 |
substratum-pds-authz-proxy.service | systemd unit for authz proxy on :3000 |
tranquil-pds.env.example | Reference only — not copied manually in CI |
Set JWT_SECRET / PDS_JWT_SECRET to the same value (ADR 37 R19). CI uses pdsJwtSecret Pulumi output or PDS_JWT_SECRET Spindle secret.
Verify live (strict)
./scripts/ci/verify-pds-live.sh
# or: pnpm run pds:verify-liveChecks (in order):
- SSH — Tranquil
GET /xrpc/_healthon127.0.0.1:2583 - SSH — authz proxy
GET /healthon127.0.0.1:3000(bodyok) - Public —
https://pds.substratum.cloud/returns HTTP 200 (not 502)
Lapsed DID putRecord → 403 substratum.entitlement.metadata_write_denied; deleteRecord → allow.
Invite-only signup
Garage v1 customer registration flows through POST /api/v1/pds/signup on the gateway (creates one-time invite codes). Do not enable open PDS createAccount on the public host until the authz proxy is live (ADR 37 C10).
Staff DIDs (Phase 4 prep)
- Create staff handle on Substratum PDS (invite flow).
- Insert
operator_rolerow for staffdid(see entitlement-admin.md). - Verify staff PDS OAuth before
admin.*ships.