Skip to content

Substratum-operated PDS deployment

Last Updated: 2026-06-11 (stateless droplet; Postgres + Spaces DR only)

Runbook for pds.substratum.cloud: Tranquil PDS upstream (recommended), PDS repo authz proxy, and TLS edge (Garage Phase 3). Design: ADR 37 (billing edge), ADR 41 (hosting topology).

Topology

ComponentRole
CaddyTLS termination; reverse_proxy to authz proxy on loopback :3000
substratum-pds-authz-proxyGates cloud.substratum.* createRecord / putRecord; allows deleteRecord when lapsed
Tranquil PDSAT Protocol upstream on 127.0.0.1:2583 — not on public :443 without proxy
DO SpacesS3-compatible blobstore (S3_BUCKET / Spaces outputs from Pulumi)
Postgres substratumEntitlement lookups (DATABASE_URL for authz proxy)
Postgres tranquil_pdsTranquil repo + account data (separate database on managed cluster)
apps/pds-portalBranded identity SPA static root (/var/www/pds-portal prod; dist/apps/pds-portal Compose)

Caddy path routing (portal + proxy)

PathHandler
/xrpc/*, /.well-known/*, /oauth/*, /healthsubstratum-pds-authz-proxy → Tranquil
/app/*Portal SPA (try_filesindex.html)
/Portal SPA fallback

See apps/pds-portal/AGENTS.md and docker/tranquil-pds/Caddyfile.

License: Tranquil PDS is AGPL-3.0. Review AGPL obligations before operating a production instance.

Local dev (Compose)

bash
docker compose --profile gateway up -d --build
ServicePortNotes
pds-authz-proxylocalhost:3000Public PDS URL for OAuth UI
pds-upstreaminternal :2583Tranquil PDS from atcr.io (ATCR login in .env)

Set PDS_AUTHZ_GATE_ENABLED=false only when debugging proxy logic without entitlement rows.

Production infra (infra/pds)

Environment: pds.substratum.cloud is production. Customer repo and account data live in Postgres tranquil_pds (managed cluster); media blobs in DO Spaces. Treat Pulumi state, managed Postgres, and Spaces as production data — not disposable dev sandboxes.

Per ADR 41: Tranquil PDS upstream; Spaces for blobs; Postgres tranquil_pds for repo/account state (DO managed-cluster daily backups, 7-day retention). The PDS droplet is cattle — no block volume, no droplet automated backups, no volume snapshot CI. Garage v1: single pds.substratum.cloud host.

Pulumi state discipline (production)

Pulumi Cloud stack straiforos-org/substratum-pds/pds is the authoritative map between resource URNs and live DigitalOcean objects (droplet, reserved IP, firewall, DNS, Spaces keys). CI and operators must keep that map aligned with reality.

RuleWhy
Never delete PDS droplet, reserved IP, or firewall in the DO console without a matching pulumi destroy or pulumi state delete on an operator workstationConsole-only deletes cause state driftpulumi refresh fails (404 on reserved IP), Spindle deploys stall
Routine changes go through Spindle — push to main or trigger Deploy PDS (infra + apps)deploy-pds.sh runs refresh → up → verify → app deploy → live verify
pulumi refresh reconciles drift when resources still exist in DO but differ from stateIt does not repair state after console deletes — fix drift with pulumi destroy locally or explicit pulumi state delete (last resort, ops log required)
Do not enable DO droplet automated backups or pay for volume snapshotsCustomer repos live in Postgres + Spaces only; boot disk is reprovisioned by cloud-init
One operator drives destructive stack ops — announce in ops channel, record stack name, reason, and timestampAvoid concurrent local pulumi up while Spindle runs deploy-pds.sh

Approved change classes

GoalPath
Config / cloud-init / sizing tweakPush → Spindle pds.yml (or local pulumi up only when coordinating with CI)
Droplet replaceSpindle or pulumi up — redeploy Tranquil + authz proxy; Postgres and Spaces unchanged
Full greenfield (retire host)Greenfield reprovisionlocal pulumi destroy, then push for Spindle up
Emergency state repairpulumi state delete for specific URNs — break-glass only; document before and after in ops log

Stack outputs are operational contracts. dropletIp, pdsSshPrivateKey, and Spaces credentials in outputs must match live DO before SSH or manual deploy steps.

Durable state and backups

DataWhereBackup
Repos / accountsPostgres tranquil_pds (infra/data)DO managed DB — daily, 7-day retention (automatic)
Media blobsDO SpacesBucket in same DO project; lifecycle policies optional
Droplet boot diskEphemeralNone — cloud-init + runbook redeploy

See Data deployment for Postgres stack outputs. Restore Tranquil from tranquilPdsUri after any droplet replace.

Droplet rebuild

When CI or pulumi up replaces the droplet, cloud-init reprovisions Caddy and Docker on a fresh VM. Postgres tranquil_pds and PDS blobs in Spaces are unaffected. Spindle deploy-pds.sh redeploys Tranquil and the authz proxy automatically from Pulumi outputs.

Greenfield reprovision

Production destructive ritual — use when retiring the PDS host. Postgres tranquil_pds and Spaces blobs are independent of this stack — destroy only drops droplet/DNS/Spaces keys infra unless you also wipe the data stack or bucket.

  1. Clear legacy stack keys if present:
bash
cd infra/pds
pulumi stack select pds
pulumi config rm dataDirectory --yes 2>/dev/null || true
pulumi config rm volumeSizeGb --yes 2>/dev/null || true
pulumi config rm volumeName --yes 2>/dev/null || true
pulumi config rm pdsDeployPrivateKey --yes 2>/dev/null || true
pulumi config rm sshPublicKeys --yes 2>/dev/null || true
  1. Destroy the stack locally:
bash
pulumi destroy --yes

Requires PULUMI_ACCESS_TOKEN, DIGITALOCEAN_TOKEN, and the same stack config CI uses. Do not delete droplet/IP in the DO console instead of pulumi destroy.

If pulumi destroy fails because resources were already removed in the console (state drift), repair state on the operator workstation before retrying. Record every pulumi state delete in the ops log.

  1. Push to main (or trigger Spindle Deploy PDS infra). CI runs deploy-pds.sh: pulumi refreshpulumi up → verify.

  2. After CI succeeds, verify-pds-live.sh confirms Tranquil, authz proxy, and public HTTPS 200 — no manual app deploy required when Spindle secrets are set.

  3. Legacy cleanup: delete orphaned pds-data block volume and any old volume snapshots in the DO console if they remain from pre–stateless stacks.

PDS blobs in Spaces survive stack destroy; wipe the bucket separately in DO only when intentionally resetting blob storage.

CI (Spindle)

.tangled/workflows/pds.yml — on main push (or manual):

  1. deploy-pds.sh — refresh → pulumi up → infra verify → Tranquil DB grants → deploy-pds-app.shverify-pds-live.sh

Full stack deploy. CI provisions the droplet (Caddy TLS, Docker, DNS), reads infra/data outputs (tranquilPdsUri, databaseUri) and infra/pds outputs (Spaces keys, pdsJwtSecret, domain), renders env files on the droplet, deploys Tranquil (Docker) + authz proxy (systemd), and asserts live health.

ScriptWhenPass criteria
verify-pds.shMid-pipeline (before app deploy)Infra bootstrap; public HTTPS any status (502 OK until app step)
deploy-pds-app.shEvery deploy-pds.shRenders env from Pulumi; Tranquil :2583 + authz :3000 healthy on droplet
verify-pds-live.shEnd of deploy-pds.shTranquil :2583/xrpc/_health, authz :3000/health, portal / HTML, OAuth metadata JSON, public HTTPS 200

CI assumes Pulumi state matches DigitalOcean. Console-only deletes block refresh until an operator repairs state locally (see Pulumi state discipline).

Spindle secrets: PULUMI_ACCESS_TOKEN, DIGITALOCEAN_TOKEN, ATCR_HANDLE, ATCR_APP_PASSWORD, optional PDS_JWT_SECRET (or set pdsJwtSecret on the PDS Pulumi stack), optional DIGITALOCEAN_PROJECT_ID, PDS_DOMAIN (default pds.substratum.cloud), PDS_SSH_HOST.

Deploy SSH key comes from Pulumi stack output pdsSshPrivateKey — not a Spindle secret.

Prerequisites: infra/data stack applied (tranquilPdsUri, databaseUri outputs). First deploy: set pdsJwtSecret once (pulumi config set pdsJwtSecret "$(openssl rand -base64 48)" --secret in infra/pds) or provide PDS_JWT_SECRET in Spindle.

Operator reference (manual / debug)

Use these when debugging outside CI or when Spindle cannot reach managed Postgres for grants.

Tranquil Postgres grants

bootstrap-tranquil-pds-db.sh runs automatically in deploy-pds.sh. If the runner is firewall-blocked, SSH to a droplet tagged substratum and run it manually.

Re-deploy apps only

bash
export PULUMI_ACCESS_TOKEN=… DIGITALOCEAN_TOKEN=… ATCR_HANDLE=… ATCR_APP_PASSWORD=
./scripts/ci/deploy-pds-app.sh
./scripts/ci/verify-pds-live.sh

Example env files

infra/pds/tranquil-pds.env.example documents the shape of /etc/tranquil-pds/tranquil.env (CI renders this from stack outputs). Authz proxy env: /etc/substratum/pds-authz-proxy.env (DATABASE_URL = entitlements databaseUri from data stack).

FilePurpose
docker-compose.production.ymlTranquil upstream on loopback :2583
substratum-pds-authz-proxy.servicesystemd unit for authz proxy on :3000
tranquil-pds.env.exampleReference only — not copied manually in CI

Set JWT_SECRET / PDS_JWT_SECRET to the same value (ADR 37 R19). CI uses pdsJwtSecret Pulumi output or PDS_JWT_SECRET Spindle secret.

Verify live (strict)

bash
./scripts/ci/verify-pds-live.sh
# or: pnpm run pds:verify-live

Checks (in order):

  1. SSH — Tranquil GET /xrpc/_health on 127.0.0.1:2583
  2. SSH — authz proxy GET /health on 127.0.0.1:3000 (body ok)
  3. Public — https://pds.substratum.cloud/ returns HTTP 200 (not 502)

Lapsed DID putRecord403 substratum.entitlement.metadata_write_denied; deleteRecord → allow.

Invite-only signup

Garage v1 customer registration flows through POST /api/v1/pds/signup on the gateway (creates one-time invite codes). Do not enable open PDS createAccount on the public host until the authz proxy is live (ADR 37 C10).

Staff DIDs (Phase 4 prep)

  1. Create staff handle on Substratum PDS (invite flow).
  2. Insert operator_role row for staff did (see entitlement-admin.md).
  3. Verify staff PDS OAuth before admin.* ships.