Catalog vs blockstore storage
Last Updated: 2026-06-03
How PostgreSQL catalog size compares to blockstore (file bytes) for capacity planning on self-hosted and SaaS deployments. Design context: ADR 22 (dual persistence), ADR 30 (catalog plane), ADR 32 (quotas vs node disk).
Two planes on disk
Substratum does not reserve a fixed percentage of disk for Postgres vs user files. The installer lays out separate directories; each grows independently.
| Plane | Self-hosted path | What it stores |
|---|---|---|
| Blockstore | {install_root}/data/blocks/ | Content-addressed file bytes (FlatFS sharding by CID prefix) |
| Catalog | {install_root}/postgres/ | PostgreSQL cluster: drives, paths, CIDs, ACLs, OAuth sessions, receipt-sync outbox, entitlements |
Production SaaS uses the same split: managed Postgres for catalog, S3-compatible object storage for blocks (ADR 08).
Third plane (not in postgres/ or data/blocks/): passport receipts are also written to the owner’s PDS repo for mesh provenance (ADR 27, ADR 28). Capacity planning for “all metadata” should include PDS growth if you operate or export repos.
Why the ratio changes with file size
Catalog cost is roughly per file (and per folder row), not per byte of payload:
- One
drive_entryrow per file (path,asset_cid,receipt_cid,size, ACL sync state). - One
receipt_sync_outboxrow per receipt job (includes serialized receipt inpayload_json; rows are markedsyncedbut not deleted today). passportand junction ACL rows afterReceiptSyncWorkercompletes.
Blockstore cost scales with file size (plus negligible overhead for multi-chunk manifests when files exceed SWARM_MAX_BLOCK_BYTES (25 MiB) — see Glossary and ingress chunking in ADR 32).
So:
- Large files → catalog : blockstore ratio approaches zero.
- Small files → metadata can be larger than the blob on disk.
Order-of-magnitude per file
Use M ≈ catalog bytes attributed to one file (rows + indexes + one retained outbox row), F = file bytes in the blockstore.
| Component | Typical size (order of magnitude) |
|---|---|
drive_entry + indexes | ~0.5–2 KiB (path length dominates) |
receipt_sync_outbox (payload_json) | ~1–2 KiB (retained after sync) |
passport + passport_access_control | ~0.3–1 KiB + ~0.1–0.2 KiB per grantee |
| Parent directory rows | Extra drive_entry rows for each missing path segment (metadata only) |
Rule of thumb: M ≈ 2–5 KiB per uploaded file in a typical home library (short paths, few grantees). Add Postgres cluster baseline (empty DB, WAL, migrations — often tens of MiB on a fresh install) when file count is low.
Approximate catalog share of on-node storage:
catalog_ratio ≈ M / (M + F) (ignore baseline for large libraries)| File size F | Catalog ratio (illustrative) |
|---|---|
| 1 KiB | ~50–80% |
| 100 KiB | ~2–5% |
| 10 MiB | ~0.02–0.05% |
| 1 GiB | ~0.0002% |
Example libraries
| Library | Assumption | Catalog (≈) | Blocks | Catalog % of total |
|---|---|---|---|---|
| Many small files | 10 000 × 50 KiB files, M ≈ 4 KiB each | ~40 MiB | ~500 MiB | ~7% |
| Few large files | 100 × 1 GiB files, M ≈ 4 KiB each | ~0.4 MiB | ~100 GiB | ~0.004% |
Chunking does not change catalog row count: a 1 GiB file still has one drive_entry; the blockstore holds ~41 × 25 MiB blocks plus a small JSON manifest block.
What quotas measure
Do not confuse catalog size with upload quotas:
| Mechanism | What it limits | Where |
|---|---|---|
SaaS quota_bytes / used_bytes | File bytes reserved and committed | Postgres entitlements (ADR 32) |
Installer max_bytes / warn_bytes | Operator policy on the node blockstore path | Triangle manifest device.storage (ADR 23) |
Neither cap reserves space inside Postgres for catalog growth. A full disk on a home server is usually blocks filling before catalog, unless you store huge numbers of tiny files or the outbox/history tables grow without retention.
Measuring on a running system
Self-hosted (~/.substratum)
# Blockstore (user content)
du -sh ~/.substratum/data/blocks
# PostgreSQL data directory
du -sh ~/.substratum/postgresCompare the two totals. For row-level sanity checks, connect to port 35432 (installer default) and inspect catalog size:
psql "postgres://substratum@127.0.0.1:35432/substratum" -c \
"SELECT pg_size_pretty(pg_database_size(current_database()));"
psql "postgres://substratum@127.0.0.1:35432/substratum" -c \
"SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;"(Adjust connection URL to match your substratum-gateway.json / local role.)
Compose dev / production
| Environment | Blockstore | Catalog |
|---|---|---|
| Compose + FlatFS | ./data/blocks (see docker-compose.fs.yml) | Postgres volume / managed instance |
| Production | S3-compatible bucket | Managed PostgreSQL |
Sum object storage billed size and database disk separately; do not use used_bytes alone as total disk usage.
Operational implications
- Backup scope: Back up
postgres/(catalog + RLS) anddata/blocks/(or S3 bucket) together for a consistent restore story; PDS repos are separate if you rely on mesh provenance. - Small-file workloads: Photo thumbnails, icons, or millions of tiny exports inflate catalog and outbox faster than blockstore — watch
drive_entrycount andreceipt_sync_outboxtable size. - Large-file workloads: Video and disk images dominate blockstore; plan disk on
data/blocks(or object storage), not Postgres sizing. - Outbox retention: Synced outbox rows currently accumulate with full
payload_json. Long-running nodes with heavy churn may need a future retention job (not shipped in v1); monitorreceipt_sync_outboxgrowth. - Desktop installer Step 3:
max_bytescaps the node storage policy for the data root, not a Postgres-vs-blocks split (install layout).
Delete lifecycle
User delete of a drive node (ADR 35) must touch three planes:
- Catalog —
drive_entryand passport index rows removed (library gone). - PDS —
file.deleted/entry.deletedasync workers tombstone receipt and filesystem records (mesh authority). - Blockstore + quota —
used_bytesdecremented on SaaS; local blockstore bytes removed when no catalog row still references theasset_cid. - Global Triangle —
mesh_unpin_outboxworker sends unpin to everyPINNING_TARGETSpeer; terminalfailedrows are the DLQ when peers stay offline (symmetric durability to receipt sync).
Until ADR 35 is implemented, delete is catalog-only and does not free quota, local blocks, or triangle replicas.
Related
- ADR 35: Drive Node Delete
- Self-hosted troubleshooting — install paths and logs
- Production deployment — Postgres + S3 topology
- Glossary: Swarm block cap — 25 MiB mesh blocks vs upload ceilings
- ADR 24: Installer ports and layout