ADR-0004: Store run state in a _privaci schema in the target database¶
Status¶
Accepted — 2026-05-28
Context¶
PrivaCI needs persistent state for:
- Resumability — a crashed 4-hour job should resume, not restart.
- Audit log — a record of which columns were masked with which action, plus auto-detect findings and warnings, queryable for years.
- Source-DB identity — a stable
source_db_hashper source, used for run identity and resumability (any commercial billing use is out of scope for the engine; see ADR-0003). - Drift detection (commercial v1.x) — comparing the current catalog snapshot to the last one.
- Compliance reports (commercial v1.x) — generated from audit-log history.
Where should this state live?
Alternatives considered:
- S3 / blob storage — external dependency, breaks air-gapped customers, adds another credential to manage.
- Redis / managed KV — extra moving part for a batch job.
- Container-local SQLite — state dies with the container; useless for CI/CD where every run starts in a fresh container.
- Source database — read-only by product policy.
- Customer-managed metadata DB (separate connection string) — configurable but onboarding friction; an additional secret to manage and an additional permission grant to request.
Decision¶
State lives in the target database, in a dedicated _privaci
schema created idempotently on first run. Tables:
_privaci.runs— one row per run._privaci.table_checkpoints— one row per (run, table) for per-batch resumability._privaci.audit_log— one row per masking event.
Customers grant CREATE SCHEMA and DML on _privaci.* to the
masking role. This is the only meaningful permission requirement
beyond standard DML on the masked tables.
Consequences¶
Why this is a strong choice¶
- One unified location for resumability, audit, drift, and reports.
- No external dependencies — PrivaCI stays a single container with two DB connections and nothing else.
- Co-locates audit data with the masked data it describes. Customer auditors query one database with the SQL skills they already have.
- Survives container restarts — central to the resumability promise.
- Air-gap friendly — works in environments with no internet egress.
Trade-offs accepted¶
- Customer must grant
CREATE SCHEMA. Documented in quickstart. Pre-flight check fails with a clear error if the grant is missing. - State and masked data share a database. A catastrophic target failure loses both. Acceptable because target databases are by definition staging/dev, and the source remains untouched.
- Schema version drift — engine v2 reading
_privacischema written by v1 must apply additive migrations. The state-management module owns this.
Forbidden patterns¶
- The
_privacischema is not for customer use. Application code must not read from or write to it. The pre-flight target-empty check excludes_privacito avoid false positives on subsequent runs. - We never write input PII to any
_privacitable. Audit entries reference columns by name and counts, not values. The salt is stored as a 16-byte fingerprint, never as the salt itself.