ADR 34: Idempotent Directory Creation and Concurrency
Status: Accepted
Date: 2026-06-03
Last Updated: 2026-06-03
Terms (this ADR)
| ID | Term | Meaning |
|---|---|---|
| Rn | Functional requirement | Numbered obligation the system must meet (R1, R2, …). |
| NRn | Non-functional requirement | Quality attribute: security, performance, operability (NR1, NR2, …). |
| Cn | Constraint | Non-negotiable boundary; violating it invalidates the decision (C1, C2, …). |
| CCn | Cross-cutting challenge | Risk or tension that spans components, with a documented mitigation (CC1, …). |
| TUS | Resumable Uploads | Protocol for resumable file uploads (see ADR 26). |
Canonical product vocabulary: Glossary.
Context
Deeply nested folder uploads (e.g., Instagram JSON exports with ~6,000 files and 5+ levels of nesting) trigger a high volume of concurrent TUS finalize requests. Each request ensures its parent directory tree exists in the database.
Without strict idempotency and database-level constraints, concurrent requests for files in the same directory would attempt to create the same directory entry simultaneously. This led to:
- Race Conditions: Multiple threads checked for a directory's existence, found it missing, and both attempted to insert it.
- Duplicate Entries: The
drive_entrytable allowed multiple rows with the same(drive_id, path), causing inconsistent tree listings and UI "truncation" where folders appeared missing because the API returned an unexpected duplicate or the wrong entry was indexed. - E2E Flakiness: Large-scale tests were unreliable due to synchronization issues between the frontend upload queue and the backend persistence layer.
Requirements
Functional requirements (R1–R2)
| ID | Requirement |
|---|---|
| R1 | Directory creation must be idempotent; multiple concurrent requests for the same path must result in exactly one database entry. |
| R2 | The system must support deeply nested folder uploads (6,000+ files) without data loss or structural corruption. |
Non-functional requirements (NR1–NR2)
| ID | Requirement |
|---|---|
| NR1 | Concurrency: The backend must handle high-concurrency TUS finalization without deadlocks or RecordNotUnique errors. |
| NR2 | Observability: Large-scale uploads must be verifiable via E2E tests with reliable synchronization. |
Constraints (C1)
| ID | Constraint |
|---|---|
| C1 | Database Integrity: The database must enforce path uniqueness per drive to prevent invalid states. |
Cross-cutting challenges (CC1)
| ID | Challenge | Mitigation |
|---|---|---|
| CC1 | E2E Test Duration: 1.7GB uploads are too slow for standard CI. | Use a tiered testing strategy: fast "Mini-Instagram" mocks for PRs and full stress tests for nightly runs. |
Decision
- Database Constraint: Add a unique index to the
drive_entrytable on(drive_id, path). This provides a "hard" guarantee of path uniqueness at the storage layer. - Idempotent Insertion: Update the backend directory creation logic (
ensure_parent_directories_exist) to useON CONFLICT (drive_id, path) DO NOTHING. This allows concurrent requests to safely "fail" their insert if another request succeeded, without returning an error to the client. - Conditional ACL Sync: Only perform Access Control List (ACL) inheritance and synchronization if the directory entry was actually inserted (checking the result of the
ON CONFLICToperation). - E2E Synchronization Fix: Refactor E2E tests to initialize TUS finalize listeners before triggering the file upload to prevent missing early completion events for small files.
- Fixture Accuracy: Ensure E2E file counts include all hidden/system files (e.g.,
.DS_Store) that Playwright actually uploads, preventing premature test resolution.
Rejected alternatives
| Alternative | Why rejected |
|---|---|
| Application-level Locking: | Distributed locking (e.g., Redis) or mutexes would introduce significant performance overhead and complexity compared to native database ON CONFLICT handling. |
| Sequential Uploads: | Disabling concurrency in the frontend would severely degrade user experience for large folder uploads. |
Consequences
Positive
- Robustness: Eliminates the primary cause of "missing" folders and duplicate entries during large uploads.
- Performance: Database-level
ON CONFLICTis highly efficient and avoids expensive application-level checks. - Testability: E2E tests for large datasets are now stable and deterministic.
Negative
- Migration Complexity: Requires a one-time cleanup of existing duplicate entries in the
drive_entrytable before the unique index can be applied.
Neutral
- E2E Runtime: Stress tests remain long-running and require dedicated timeouts.
Verification (optional)
| Scenario | Expected |
|---|---|
Concurrent upload of 100 files to /A/B/C | Exactly one entry for /A, /A/B, and /A/B/C in drive_entry. |
| Full Instagram export upload (6,184 files) | E2E test passes; all manifest folders (e.g., preferences) are present and navigable. |