Skip to content

ADR 34: Idempotent Directory Creation and Concurrency

Status: Accepted
Date: 2026-06-03
Last Updated: 2026-06-03

Terms (this ADR)

IDTermMeaning
RnFunctional requirementNumbered obligation the system must meet (R1, R2, …).
NRnNon-functional requirementQuality attribute: security, performance, operability (NR1, NR2, …).
CnConstraintNon-negotiable boundary; violating it invalidates the decision (C1, C2, …).
CCnCross-cutting challengeRisk or tension that spans components, with a documented mitigation (CC1, …).
TUSResumable UploadsProtocol for resumable file uploads (see ADR 26).

Canonical product vocabulary: Glossary.

Context

Deeply nested folder uploads (e.g., Instagram JSON exports with ~6,000 files and 5+ levels of nesting) trigger a high volume of concurrent TUS finalize requests. Each request ensures its parent directory tree exists in the database.

Without strict idempotency and database-level constraints, concurrent requests for files in the same directory would attempt to create the same directory entry simultaneously. This led to:

  1. Race Conditions: Multiple threads checked for a directory's existence, found it missing, and both attempted to insert it.
  2. Duplicate Entries: The drive_entry table allowed multiple rows with the same (drive_id, path), causing inconsistent tree listings and UI "truncation" where folders appeared missing because the API returned an unexpected duplicate or the wrong entry was indexed.
  3. E2E Flakiness: Large-scale tests were unreliable due to synchronization issues between the frontend upload queue and the backend persistence layer.

Requirements

Functional requirements (R1–R2)

IDRequirement
R1Directory creation must be idempotent; multiple concurrent requests for the same path must result in exactly one database entry.
R2The system must support deeply nested folder uploads (6,000+ files) without data loss or structural corruption.

Non-functional requirements (NR1–NR2)

IDRequirement
NR1Concurrency: The backend must handle high-concurrency TUS finalization without deadlocks or RecordNotUnique errors.
NR2Observability: Large-scale uploads must be verifiable via E2E tests with reliable synchronization.

Constraints (C1)

IDConstraint
C1Database Integrity: The database must enforce path uniqueness per drive to prevent invalid states.

Cross-cutting challenges (CC1)

IDChallengeMitigation
CC1E2E Test Duration: 1.7GB uploads are too slow for standard CI.Use a tiered testing strategy: fast "Mini-Instagram" mocks for PRs and full stress tests for nightly runs.

Decision

  1. Database Constraint: Add a unique index to the drive_entry table on (drive_id, path). This provides a "hard" guarantee of path uniqueness at the storage layer.
  2. Idempotent Insertion: Update the backend directory creation logic (ensure_parent_directories_exist) to use ON CONFLICT (drive_id, path) DO NOTHING. This allows concurrent requests to safely "fail" their insert if another request succeeded, without returning an error to the client.
  3. Conditional ACL Sync: Only perform Access Control List (ACL) inheritance and synchronization if the directory entry was actually inserted (checking the result of the ON CONFLICT operation).
  4. E2E Synchronization Fix: Refactor E2E tests to initialize TUS finalize listeners before triggering the file upload to prevent missing early completion events for small files.
  5. Fixture Accuracy: Ensure E2E file counts include all hidden/system files (e.g., .DS_Store) that Playwright actually uploads, preventing premature test resolution.

Rejected alternatives

AlternativeWhy rejected
Application-level Locking:Distributed locking (e.g., Redis) or mutexes would introduce significant performance overhead and complexity compared to native database ON CONFLICT handling.
Sequential Uploads:Disabling concurrency in the frontend would severely degrade user experience for large folder uploads.

Consequences

Positive

  • Robustness: Eliminates the primary cause of "missing" folders and duplicate entries during large uploads.
  • Performance: Database-level ON CONFLICT is highly efficient and avoids expensive application-level checks.
  • Testability: E2E tests for large datasets are now stable and deterministic.

Negative

  • Migration Complexity: Requires a one-time cleanup of existing duplicate entries in the drive_entry table before the unique index can be applied.

Neutral

  • E2E Runtime: Stress tests remain long-running and require dedicated timeouts.

Verification (optional)

ScenarioExpected
Concurrent upload of 100 files to /A/B/CExactly one entry for /A, /A/B, and /A/B/C in drive_entry.
Full Instagram export upload (6,184 files)E2E test passes; all manifest folders (e.g., preferences) are present and navigable.