Swarm Command Security Gaps
Last Updated: 2026-06-04
This document records known authentication and authorization gaps around SwarmCommand handling in crates/retrieval/src/swarm.rs and related ingress call sites. It complements Retrieval Layer Gaps (implementation roadmap) with a security-focused audit.
Share remediation: Phased hardening (grantee E2E, accessRemovalAck, replication pin token, mesh grantee-deny) is tracked in Share known issues and Share remediation plan.
Design intent: GetBlock is the authorized read path — it must enforce passport ACL via the PDS (LocalPassportAccessControlResolver) after the caller identity is cryptographically bound. Do not add parallel “trusted local read” commands that skip swarm AuthN/Z.
Summary
| Command / path | AuthN | AuthZ | Main gap |
|---|---|---|---|
GetBlock | Yes (JWT + requester_did) | Yes (PDS passport) | PDS vs DB ACL desync if receipts not on owner repo |
PutBlock | Yes (JWT) | Owner-only + CID binding | No receipt/drive binding; no quota at swarm layer |
Pin | Yes (replication JWT) | Yes (PDS) | Grantees can pin; fire-and-forget replication (§3) |
Unpin | Planned (ADR 35) | Owner-only unpin JWT | Not implemented; triangle retains blocks after catalog delete |
InjectConnection | Yes (JWT, HTTP + swarm) | DID must match JWT | libp2p + PSK trust boundary after inject |
| Replication (inbound) | Yes (JWT in PinRequest) | Yes (PDS) | Exposed libp2p surface; always returns PinResponse |
| Replication unpin (inbound) | Planned | Owner unpin JWT | Symmetric teardown for delete (ADR 35) |
Remediated (2026-05-20)
| Item | Fix |
|---|---|
| §1 GetBlock AuthN | SwarmCommand::GetBlock includes jwt; swarm validates jwt.sub == requester_did before PDS is_authorized. Ingress passes session JWT from content.rs and ipfs/gateway.rs. |
| §2 PutBlock CID integrity | Swarm recomputes CID from data (block_cid::cid_from_bytes) and rejects mismatch; optional SWARM_MAX_BLOCK_BYTES cap. |
| §3 Pin PDS routing + rkey | LocalPassportAccessControlResolver uses HandleResolveConfig (home PDS → AppView → PLC #atproto_pds) and receipt_record_rkey for getRecord — fixes unauthorized Pin attempt for local *.test repos (share known issues). |
| §6 Receipt repo correctness | ADR 28 enqueue-only outbox + ReceiptSyncWorker converge Postgres catalog (receipt_sync, receipt_cid) on owner-repo receipts; HTTP returns receipt_sync: pending with provisional receipt_cid; grantee self-removal writes accessRemovalRequest on grantee PDS then converges owner receipt async. Pin runs before catalog synced; pin failure retries/poisons the job (not best-effort). |
| §3 Replication-scoped pin JWT | create_replication_pin_jwt binds sub, cid, owner_did, receipt_cid; GatewayPinPort and inbound /substratum/replication/1.0.0 verify claims match PinRequest before store. Session JWTs are not placed on the libp2p wire. |
| §7 Test harness | Integration tests always use enforcing swarm mock in crates/ingress/tests/common/mod.rs. |
1. GetBlock — remediated at swarm layer
Location: crates/retrieval/src/swarm.rs (SwarmCommand::GetBlock)
Enforced today:
- Gateway session JWT validated;
requester_didmust match JWTsub. AccessControlResolver::is_authorized(PDS passport on owner repo).
Remaining gap: HTTP may authorize from Postgres while swarm uses PDS — keep DB and owner-repo receipts in sync on every ACL change.
2. PutBlock — AuthN + CID binding; weak scope AuthZ
Location: crates/retrieval/src/swarm.rs (SwarmCommand::PutBlock)
Enforced today:
- Valid gateway session JWT;
requester_did == owner_did. - CID must match
hash(data)in swarm handler. - Block size capped at
SWARM_MAX_BLOCK_BYTES(25 MiB).
Not enforced:
| Gap | Impact |
|---|---|
| No receipt / drive / path binding | Owner can store arbitrary bytes under a matching CID |
| No quota at swarm layer | Storage exhaustion / availability (DoS) |
| No tie to upload session | Intentional for “store before receipt,” but broad |
3. Pin — replication-scoped JWT + PDS ACL (remediated)
Location: crates/retrieval/src/swarm.rs (SwarmCommand::Pin), apps/gateway/src/receipt_sync_pin.rs
Enforced today:
- Replication-scoped JWT (
ReplicationPinClaimsincrates/auth/src/token.rs):create_replication_pin_jwt/verify_replication_pin_jwtbindsub,cid,owner_did,receipt_cid, and TTL.PinRequest.jwton libp2p is never a session JWT. - Local
Pinand inbound replication verify JWT claims matchPinRequestfields beforeis_authorized+ store. is_authorized(cid, requester_did, owner_did, receipt_cid)before read + replicate.- Owner PDS routing + hashed receipt rkey — see Remediated.
Remaining gaps:
| Gap | Notes |
|---|---|
| Grantees can pin | No requirement that requester_did == owner_did; grantees on the receipt may trigger replication (may be product-intended). |
| Fire-and-forget replication | Pin returns Ok(()) after send_request; no confirmation peers persisted the block. |
4. Unpin — planned (ADR 35, not shipped)
Location (planned): SwarmCommand::Unpin, inbound handler on /substratum/replication/1.1.0 (or extended CBOR request type).
Requirement: When an owner deletes a file and catalog refcount for asset_cid is zero, the originating gateway must remove the CID from local blockstore and send unpin to every peer in PINNING_TARGETS (Global Triangle), symmetric to upload pin (GatewayPinPort).
Auth model (planned):
- Owner-scoped unpin JWT —
sub == owner_did, bindscid+owner_did; noreceipt_cid(receipt may already be tombstoned on PDS). - Inbound: verify JWT claims;
blockstore.remove(cid)— do not callis_authorized(receipt delete is async).
Gaps today: Delete removes catalog rows only; triangle nodes keep replicated blocks indefinitely.
Planned durability (ADR 35): mesh_unpin_outbox + MeshUnpinWorker — not inline-only RPC. Rows that exhaust retries enter failed (DLQ) with metrics for operators (same poison pattern as receipt sync).
5. InjectConnection — dual validation, mesh trust remains
Locations:
- HTTP:
crates/ingress/src/router/handlers/swarm.rs(GET /swarm/{did}) - Swarm:
SwarmCommand::InjectConnection
Enforced today:
- HTTP: Bearer/cookie JWT,
verify_jwt, DID in path must match session. - Swarm: Re-validate JWT;
requester_did == didbefore injecting stream.
Remaining concerns:
- Route is
/swarm/{did}, not under/api/v1— easy to omit from security reviews and edge rules. - After inject, stream joins shared libp2p; security depends on PSK (
SWARM_MASTER_SECRET+ per-DID derivation) and sidecar trust model. - No rate limiting on WebSocket upgrade / inject.
6. Inbound replication (not a SwarmCommand, same blockstore)
Location: SwarmActor::handle_swarm_event — SubstratumBehaviour request/response on /substratum/replication/1.0.0
Enforced today:
- Replication-scoped JWT in
PinRequest→validate_replication_pin(claims must match request fields). is_authorizedbeforeblockstore.put.
Gaps:
| Issue | Notes |
|---|---|
| Open libp2p replication protocol | Any peer that can reach the gateway libp2p port may send PinRequest; defense is JWT + PDS only |
Always sends PinResponse | Even on auth failure — minor oracle (handler reached) |
Wire-trusted owner_did / receipt_cid | Mitigated by pairing with JWT DID + is_authorized |
7. HTTP ↔ swarm ACL mismatch — outbox/catalog convergence (ADR 28)
Ingress often authorizes from Postgres (enforce_passport_access_control on DB-derived access_control). Swarm GetBlock authorizes from PDS via LocalPassportAccessControlResolver.
Fixed (ADR 28, enqueue-only):
- HTTP handlers call
enqueue_receipt_syncin the same transaction as catalog ACL updates — no inline PDScreateRecordon the request path. Responses returnreceipt_sync: pendingand a provisionalreceipt_cid(typically the asset CID); Postgres ACL remains authoritative for HTTP until sync completes. - Transactional outbox (
receipt_sync_outbox) holds owner-repo receipt upserts and owner receipt updates after grantee removal intent. ReceiptSyncWorker(gateway,RECEIPT_SYNC_ENABLED) claims jobs under worker RLS, writescloud.substratum.passport.receipton the owner repo (and granteeaccessRemovalRequestwhen applicable) via stored OAuth, updates catalogreceipt_cidandreceipt_sync(pending→synced/failed), and may pin via a short-lived pin JWT whenasset_cid_for_pinis set.- Grantee self-removal: grantee OAuth writes
accessRemovalRequeston the grantee PDS, then enqueues owner receipt convergence; mesh ACL follows the owner receipt until Phase B completes.
Remaining: Eventual-consistency window until the worker succeeds; monitor poison/failed jobs and owner OAuth expiry. Orphan grantee removal intents if the owner never syncs (ADR 28 negative — TTL/policy TBD). Mesh grantee-deny (Phase 2b): LocalPassportAccessControlResolver denies non-owner requesters when a matching grantee-repo accessRemovalRequest exists (ADR 28).
8. Test and integration harness — remediated
Location: crates/ingress/tests/common/mod.rs
All integration tests use a single enforcing swarm mock:
- JWT validation on
PutBlockandGetBlock - Owner-only
PutBlockwith CID binding (cid_from_bytes) - JWT-bound
GetBlock(jwt.submust matchrequester_did)
There is no permissive test mode. Tests that seed blocks call SwarmCommand::PutBlock with a valid session JWT and matching CID; drive-isolation and ACL tests that do not touch the blockstore are unaffected.
Security-focused tests: swarm_security_integration.rs, acl_pds_sync_integration.rs, extended api_integration.rs.
9. Channel and API surface
| Concern | Notes |
|---|---|
AppState.swarm_tx cloneable | Every handler gets an unauthenticated command sink unless each send site is audited |
Public store_block_in_swarm | New callers must pass owner_did + jwt; swarm verifies CID |
mpsc capacity (32) | Command flooding can stall handlers waiting on oneshot responses |
Mobile FFI (crates/mobile-ffi) | Holds swarm_tx internally; low risk until FFI exposes read/write without JWT fields |
10. Related configuration
| Variable | Role |
|---|---|
PDS_URL | Home PDS for HandleResolveConfig — first hop for owner describeRepo / getRecord in swarm ACL |
ATPROTO_APPVIEW_URL | Second hop when repo is not on home PDS; also handle/profile resolution |
PDS_HANDLE_DOMAIN | Local handle domain (e.g. test) — part of HandleResolveConfig |
SWARM_MASTER_SECRET | Per-DID PSK for injected WebSocket streams |
Misconfigured URLs (e.g. only public bsky.app while receipts live on local PDS) cause GetBlock / Pin to fail getRecord until PDS_URL + AppView fallback resolve the owner repo — availability issue, not a bypass. See share known issues.
Recommended remediation order (remaining)
- Grantee E2E green —
pnpm run e2e:share -- --grep "grantee sees shared"(share known issues). accessRemovalAckon owner repo after grantee removal (Phase 2d).- Ingress share-spam guards — ACL PATCH rate limit and grantee cap (Phase 2c).
- Document
swarm_txas trusted capability — narrow wrapper or module boundary around command sends. - Rate limit
/swarm/{did}— Tower governor or nginx. - mpsc backpressure — bounded queue + drop policy (optional load test).
References
- Swarm actor:
crates/retrieval/src/swarm.rs - Block CID helper:
crates/retrieval/src/block_cid.rs - PDS ACL resolver:
crates/retrieval/src/access_control.rs - Ingress content / upload:
crates/ingress/src/router/handlers/drives/content.rs,upload.rs - IPFS gateway read:
crates/ingress/src/router/handlers/ipfs/gateway.rs - WebSocket inject:
crates/ingress/src/router/handlers/swarm.rs - Receipt issuance:
crates/ingress/src/router/handlers/drives/pds.rs - Retrieval Layer Gaps
- ADR 17: WebSocket PSK Injection
- ADR 27: Zero Trust PDS-Based Provenance
- ADR 28: Receipt Sync Queue and Grantee Access Removal