Skip to content

ADR 26: TUS Resumable Drive Uploads

Status: Proposed
Date: 2026-05-16

Context

Drive file and folder uploads today use a single multipart/form-data POST to the gateway (POST /api/v1/drives/{drive_id}/upload). The ingress handler buffers each file entirely (field.bytes().await) before running the domain pipeline: store block in the retrieval swarm, write passport receipt, pin, and persist drive_entry rows under RLS (in crates/ingress/src/router/handlers/drives/upload.rs).

The Web File Explorer (ADR 15, ADR 20) uploads via UploadService and FormData. Folder uploads send many files sequentially with a shared upload_session_id and per-file idempotency_key, but:

  • There is no resumability after network failure.
  • Progress is invisible in the column/list/grid UI until the full batch completes and the tree is refreshed.
  • Server-authoritative progress (receive → swarm → pin → persist) is not exposed; upload_session_id is echoed but not stored for polling or events.
  • Idempotency lookup is not yet implemented server-side.

We need a standard, resumable upload transport that fits the modular monolith (ADR 01, ADR 02): tus handles HTTP byte transfer; ingress retains ownership of drive semantics, auth, and finalize into swarm + DB.

Decision

1. Adopt the tus.io protocol for drive uploads

Use tus 1.0 as the resumable upload transport. Integrate via fileloft (fileloft-core, fileloft-axum, fileloft-store-fs) in crates/ingress, mounted under the drive-scoped path:

text
POST   /api/v1/drives/{drive_id}/tus
PATCH  /api/v1/drives/{drive_id}/tus/{upload_id}
HEAD   /api/v1/drives/{drive_id}/tus/{upload_id}
DELETE /api/v1/drives/{drive_id}/tus/{upload_id}   # optional termination

tus endpoints are authenticated the same way as other drive mutations (AuthenticatedDid, AuthenticatedJwt). Shared / read-only browse paths do not expose tus creation.

2. Separate tus completion from Substratum finalize

tus only guarantees bytes are stored in a tus DataStore (filesystem spool in v1). Substratum still must CID the asset, pin in the swarm, and write passport + drive rows.

Add a Substratum-specific finalize step (documented in OpenAPI, not part of tus core):

text
POST /api/v1/drives/{drive_id}/tus/{upload_id}/finalize  → UploadEntryDto

Finalize reuses existing ingress helpers (store_block_in_swarm, persist_uploaded_entry, path sanitization, parent directory creation, ACL inheritance). Do not duplicate domain logic in the tus adapter.

The file explorer marks a file done only after finalize succeeds, not when tus reports 100% offset.

3. Drive context via tus Upload-Metadata

Encode drive placement in tus Upload-Metadata (base64 key/value), validated on creation:

KeyPurpose
pathParent folder path inside the drive (empty = drive root)
relative_pathFull relative path for folder uploads (e.g. Assets/photo.png)
filenameLeaf name when needed
batch_idCorrelates folder-upload files (replaces ad-hoc upload_session_id in FormData)
idempotency_key{batch_id}:{relative_path} or per-file key
content_typeOptional MIME hint

Paths must pass the same sanitization rules as today's multipart upload (sanitize_rel_path, sanitize_segment).

Enforce Upload-LengthUPLOAD_MAX_FILE_BYTES (global ceiling 1 TiB per ADR 32; account plans may be lower on SaaS).

4. Server-reported progress beyond tus offset

tus supplies byte offset progress during PATCH. Substratum publishes phase progress for finalize and folder batches:

PhaseMeaning
uploadingtus offset < Upload-Length
storing_blockPutBlock to swarm
pinningSwarm pin
persistingDB + passport
doneFinalize returned UploadEntryDto
errorTerminal failure

Expose progress via:

text
GET /api/v1/drives/{drive_id}/upload/batches/{batch_id}           # snapshot
GET /api/v1/drives/{drive_id}/upload/batches/{batch_id}/events   # SSE

Reuse the SSE pattern already used for system events (ADR 18 alignment). v1 progress store: in-memory Arc<RwLock<...>> on AppState; document migration to DB/Redis for multi-instance gateways as follow-up.

5. File explorer integration

  • Add tus-js-client and a TusUploadService implementing the existing upload contract (ADR 20).
  • Flow per file: create tus upload → PATCH chunks → finalize → optional SSE for phases.
  • Folder upload: one batch_id, N tus resources, bounded concurrency (e.g. 2–3).
  • UI: optimistic rows in Miller columns (pending/grey, progress ring, highlight on done) driven by batch SSE + tus onProgress. OpenAPI-generated client covers finalize/batch DTOs only; tus URLs are stable constants aligned with ingress.

6. Keep multipart upload during migration

Retain POST .../upload until tus + finalize + explorer UI are stable. Deprecate in OpenAPI after Phase 5 (see roadmap). No breaking removal without a release note.

7. Ingress module layout

Per crates/ingress/AGENTS.md, extend upload handling as a concern directory:

text
crates/ingress/src/router/handlers/drives/upload/
  mod.rs           # shared finalize, re-exports
  multipart.rs     # existing handle_drive_upload (moved)
  tus.rs           # fileloft mount + metadata auth
  tus_finalize.rs
  batch_events.rs  # SSE + snapshot

8. Operational constraints (v1)

  • Temp storage: filesystem spool under a configurable gateway temp root; TTL cleanup for incomplete uploads (e.g. 24h).
  • Chunk size: default 5 MiB per PATCH (tunable); edge/proxy body limits must allow chunk size, not only full file size.
  • CORS: expose tus headers (Upload-Offset, Upload-Length, Location, Tus-Resumable, Tus-Version).
  • Non-goals (v1): tus concatenation; multi-pod shared spool without shared storage. Mesh block chunking may still use smaller SWARM_MAX_BLOCK_BYTES; that is separate from the ingress upload ceiling.

Consequences

Positive

  • Resumable uploads and standard client libraries (tus-js-client).
  • Honest server-side progress through persist/pin, not only browser xhr progress.
  • Folder uploads gain batch correlation and reconnect via batch_id + SSE.
  • Domain finalize stays in one ingress pipeline; tus remains a replaceable adapter.

Negative

  • Two-step client flow (tus + finalize) increases integration complexity vs one POST.
  • tus routes are not fully described by OpenAPI; finalize/batch endpoints are the codegen surface.
  • Filesystem spool adds disk and cleanup obligations on gateway nodes.
  • Single-gateway in-memory batch state does not survive restart or horizontal scale until Phase 5 hardening.

Deferred Decisions and Constraints

#QuestionDefault
1Tus implementation: Use fileloft vs hand-rolled tusfileloft
2Temp store: Local FS vs S3Local FS on gateway (not S3) unless staging policy changes
3Default chunk size:5 MiB
4Multipart coexistence:Yes until UI fully migrates