Skip to content

Catalog vs blockstore storage

Last Updated: 2026-06-03

How PostgreSQL catalog size compares to blockstore (file bytes) for capacity planning on self-hosted and SaaS deployments. Design context: ADR 22 (dual persistence), ADR 30 (catalog plane), ADR 32 (quotas vs node disk).

Two planes on disk

Substratum does not reserve a fixed percentage of disk for Postgres vs user files. The installer lays out separate directories; each grows independently.

PlaneSelf-hosted pathWhat it stores
Blockstore{install_root}/data/blocks/Content-addressed file bytes (FlatFS sharding by CID prefix)
Catalog{install_root}/postgres/PostgreSQL cluster: drives, paths, CIDs, ACLs, OAuth sessions, receipt-sync outbox, entitlements

Production SaaS uses the same split: managed Postgres for catalog, S3-compatible object storage for blocks (ADR 08).

Third plane (not in postgres/ or data/blocks/): passport receipts are also written to the owner’s PDS repo for mesh provenance (ADR 27, ADR 28). Capacity planning for “all metadata” should include PDS growth if you operate or export repos.

Why the ratio changes with file size

Catalog cost is roughly per file (and per folder row), not per byte of payload:

  • One drive_entry row per file (path, asset_cid, receipt_cid, size, ACL sync state).
  • One receipt_sync_outbox row per receipt job (includes serialized receipt in payload_json; rows are marked synced but not deleted today).
  • passport and junction ACL rows after ReceiptSyncWorker completes.

Blockstore cost scales with file size (plus negligible overhead for multi-chunk manifests when files exceed SWARM_MAX_BLOCK_BYTES (25 MiB) — see Glossary and ingress chunking in ADR 32).

So:

  • Large files → catalog : blockstore ratio approaches zero.
  • Small files → metadata can be larger than the blob on disk.

Order-of-magnitude per file

Use M ≈ catalog bytes attributed to one file (rows + indexes + one retained outbox row), F = file bytes in the blockstore.

ComponentTypical size (order of magnitude)
drive_entry + indexes~0.5–2 KiB (path length dominates)
receipt_sync_outbox (payload_json)~1–2 KiB (retained after sync)
passport + passport_access_control~0.3–1 KiB + ~0.1–0.2 KiB per grantee
Parent directory rowsExtra drive_entry rows for each missing path segment (metadata only)

Rule of thumb: M ≈ 2–5 KiB per uploaded file in a typical home library (short paths, few grantees). Add Postgres cluster baseline (empty DB, WAL, migrations — often tens of MiB on a fresh install) when file count is low.

Approximate catalog share of on-node storage:

text
catalog_ratio ≈ M / (M + F)     (ignore baseline for large libraries)
File size FCatalog ratio (illustrative)
1 KiB~50–80%
100 KiB~2–5%
10 MiB~0.02–0.05%
1 GiB~0.0002%

Example libraries

LibraryAssumptionCatalog (≈)BlocksCatalog % of total
Many small files10 000 × 50 KiB files, M ≈ 4 KiB each~40 MiB~500 MiB~7%
Few large files100 × 1 GiB files, M ≈ 4 KiB each~0.4 MiB~100 GiB~0.004%

Chunking does not change catalog row count: a 1 GiB file still has one drive_entry; the blockstore holds ~41 × 25 MiB blocks plus a small JSON manifest block.

What quotas measure

Do not confuse catalog size with upload quotas:

MechanismWhat it limitsWhere
SaaS quota_bytes / used_bytesFile bytes reserved and committedPostgres entitlements (ADR 32)
Installer max_bytes / warn_bytesOperator policy on the node blockstore pathTriangle manifest device.storage (ADR 23)

Neither cap reserves space inside Postgres for catalog growth. A full disk on a home server is usually blocks filling before catalog, unless you store huge numbers of tiny files or the outbox/history tables grow without retention.

Measuring on a running system

Self-hosted (~/.substratum)

bash
# Blockstore (user content)
du -sh ~/.substratum/data/blocks

# PostgreSQL data directory
du -sh ~/.substratum/postgres

Compare the two totals. For row-level sanity checks, connect to port 35432 (installer default) and inspect catalog size:

bash
psql "postgres://substratum@127.0.0.1:35432/substratum" -c \
  "SELECT pg_size_pretty(pg_database_size(current_database()));"
psql "postgres://substratum@127.0.0.1:35432/substratum" -c \
  "SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
   FROM pg_catalog.pg_statio_user_tables
   ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;"

(Adjust connection URL to match your substratum-gateway.json / local role.)

Compose dev / production

EnvironmentBlockstoreCatalog
Compose + FlatFS./data/blocks (see docker-compose.fs.yml)Postgres volume / managed instance
ProductionS3-compatible bucketManaged PostgreSQL

Sum object storage billed size and database disk separately; do not use used_bytes alone as total disk usage.

Operational implications

  1. Backup scope: Back up postgres/ (catalog + RLS) and data/blocks/ (or S3 bucket) together for a consistent restore story; PDS repos are separate if you rely on mesh provenance.
  2. Small-file workloads: Photo thumbnails, icons, or millions of tiny exports inflate catalog and outbox faster than blockstore — watch drive_entry count and receipt_sync_outbox table size.
  3. Large-file workloads: Video and disk images dominate blockstore; plan disk on data/blocks (or object storage), not Postgres sizing.
  4. Outbox retention: Synced outbox rows currently accumulate with full payload_json. Long-running nodes with heavy churn may need a future retention job (not shipped in v1); monitor receipt_sync_outbox growth.
  5. Desktop installer Step 3: max_bytes caps the node storage policy for the data root, not a Postgres-vs-blocks split (install layout).

Delete lifecycle

User delete of a drive node (ADR 35) must touch three planes:

  1. Catalogdrive_entry and passport index rows removed (library gone).
  2. PDSfile.deleted / entry.deleted async workers tombstone receipt and filesystem records (mesh authority).
  3. Blockstore + quotaused_bytes decremented on SaaS; local blockstore bytes removed when no catalog row still references the asset_cid.
  4. Global Trianglemesh_unpin_outbox worker sends unpin to every PINNING_TARGETS peer; terminal failed rows are the DLQ when peers stay offline (symmetric durability to receipt sync).

Until ADR 35 is implemented, delete is catalog-only and does not free quota, local blocks, or triangle replicas.