Configuring PrivaCI (mask-rules.yaml)¶
PrivaCI is driven by a single declarative YAML file, mounted in the container at
/config/mask-rules.yaml (override with --config). The engine validates this
file strictly on boot: unknown keys, misspelled actions, and unsupported
versions fail fast with exit code 3
before any database connection is opened.
This page is the operator reference for every field. The same schema is available machine-readable for editor auto-completion:
privaci schema config > mask-rules.schema.json
Quickstart¶
A minimal config that masks two columns on one table:
version: "1.0"
global_salt: "${ANONYMIZATION_SALT}"
tables:
public.users:
strategy: transform
columns:
first_name: { action: fake, provider: first_name }
email: { action: fake, provider: email }
Validate it without running a masking job:
privaci validate --config mask-rules.yaml
# Config mask-rules.yaml is valid.
A failing example — a misspelled action — exits 3 and names the column path:
privaci validate --config mask-rules.yaml
# Context: Validating mask-rules.yaml
# Cause: Config does not satisfy the schema:
# - tables.users.columns.email: ...
# Remediation: Fix the fields listed above; see docs/configuration.md.
Top-level options¶
| Key | Type | Default | Description |
|---|---|---|---|
version |
string | required | Config schema version. The v1.x engine accepts "1.0" only. |
global_salt |
string | secret URI | none | Salt literal or secret URI; resolved at run time. Never logged. |
on_existing_data |
enum | fail |
Target collision policy: fail, truncate, drop_create. append is rejected in the MVP. |
strict_autodetect |
bool | false |
Fail the run when auto-detect finds uncovered PII columns. |
replicate_all_indexes |
bool | false |
Replicate every source index, not just unique/PK indexes. |
batch_size |
int | 10000 |
Default streaming batch size in rows (must be ≥ 1). |
audit_log |
bool | true |
Write the per-run audit log to _privaci.audit_log. |
auto_detect |
bool | true |
Run the zero-config PII column scanner. |
implied_fk_ignore |
list[string] | [] |
Source column paths (schema.table.column) whose implied FK warnings are silenced. |
tables |
mapping | {} |
Table identifier → table config. Keys must match the catalog (schema.table, e.g. public.users). |
Auto-detect¶
When auto_detect: true (default), PrivaCI inspects every column name against a
built-in pattern library and assigns a confidence score:
| Confidence | Run-time behaviour |
|---|---|
high |
Auto-apply the inferred action (e.g. fake for email, ner_mask for long clinical notes). |
medium |
Flag for manual review in privaci dry-run --report; passthrough unless YAML overrides. |
low |
Passthrough. |
Explicit YAML column entries always win, including passthrough. Structured
patterns (email, ssn, phone, …) are high on name match alone. Freeform
patterns (note*, description, …) additionally require column type (text
or varchar ≥ 500) and use pg_stats.avg_width plus table-name context — see
ADR-0011.
Review detections without writing data:
privaci dry-run --config mask-rules.yaml --report review.md
With strict_autodetect: true, any high or medium finding not explicitly
listed under tables.<id>.columns fails validation (exit 3).
Version policy. A config whose
versionis not"1.0"is rejected by the v1.x engine. A future engine reading an older config will direct you toprivaci migrate-config --from <old> --to <new> <path>. When--fromequals--to,migrate-configis a no-op and exits0.
Table configuration¶
Each entry under tables accepts:
| Key | Type | Default | Description |
|---|---|---|---|
strategy |
enum | transform |
transform (mask + copy), exclude (drop in target), empty (recreate, no rows), truncate (empty before copy). |
columns |
mapping | {} |
Column name → action. |
batch_size |
int | inherits global | Per-table override (must be ≥ 1). |
null_orphan_fks |
bool | false |
Set FK columns whose referent is lost to NULL instead of failing. |
empty creates the table DDL on the target but streams zero rows and marks the
table checkpoint done. truncate does the same after TRUNCATE on an
existing target table — useful when you need the schema but not the data.
Objects not replicated¶
Views, materialized views, triggers, rules, and logical-replication publications
are never copied to the target. Each skipped object is recorded in
_privaci.audit_log as skipped_object with a kind payload (view,
materialized_view, trigger, rule, or publication).
Actions¶
Every column action is selected by its action discriminator. Unknown action
names and extra keys are rejected with the offending YAML path.
| Action | Required params | Optional params | Notes |
|---|---|---|---|
fake |
provider |
seed_alias, params |
Deterministic synthetic value from a registered provider. Unknown provider names fail validation (exit 3). |
regex_mask |
pattern, replace |
flags |
pattern must compile; unknown flags are rejected. |
hash |
— | — | Salted, deterministic hash. |
passthrough |
— | — | Copy the value unchanged. |
null |
— | — | Write NULL. Rejected at pre-flight on NOT NULL columns. |
static |
value |
— | Replace every value with a constant. |
ner_mask |
— | entities |
Level-2 SpaCy NER. entities defaults to PERSON, ORG, GPE, LOC. |
ai_refine |
provider, model |
params |
Level-3, commercial only. Rejected when privaci-commercial is not installed. |
Built-in fake providers¶
| Provider | Output shape |
|---|---|
first_name, last_name, full_name |
Synthetic person names |
email |
local@domain using fake domains (fakedom.net, example.test, tryvault.dev by default); params.domain overrides |
phone |
E.164 test number; preserves a leading country code when detectable |
street, city, postcode, country, address |
Synthetic postal components or a one-line address |
dob |
ISO-8601 date within ±5 years of the input age bracket |
ip_address |
RFC 5737 TEST-NET address |
ssn |
000-099-XXXX SSA advertising test range |
credit_card |
Luhn-valid number from documented test BINs |
uuid |
Deterministic UUIDv4 layout |
company, job_title, username |
Synthetic org / role / handle |
password |
Always the fixed placeholder privaci-test-pw (never a real-looking hash) |
Register custom providers from plugins via privaci.contracts.register_provider.
Examples¶
tables:
public.users:
columns:
first_name: { action: fake, provider: first_name }
email: { action: fake, provider: email, seed_alias: user_email }
ssn: { action: regex_mask, pattern: "^\\d{3}-\\d{2}-\\d{4}$", replace: "000-00-0000" }
password_hash: { action: hash }
notes: { action: ner_mask } # masks PERSON/ORG/GPE/LOC
audit_internal.audit_log_events:
strategy: exclude # table omitted in target
seed_alias for foreign keys¶
Salt-hashed faking already keeps the same input → same output across columns.
When two columns hold the same logical value under different column names
(for example an FK orders.customer_email pointing at users.email), give them
a shared seed_alias so they fake to the same value and the relationship
survives:
tables:
public.users:
columns:
email: { action: fake, provider: email, seed_alias: user_email }
public.orders:
columns:
customer_email: { action: fake, provider: email, seed_alias: user_email }
Implied (soft) foreign keys¶
Postgres only enforces relationships you declare. Many real schemas carry
soft references — a column like clinical.patient_documents.referring_provider_email
that informally points at clinical.providers.email with no FOREIGN KEY
constraint. PrivaCI cannot see these in the catalog, so it would fake the two
columns independently and silently break the link.
During introspection PrivaCI flags these. A column whose name ends in a known
suffix (_email, _username, _user_id, _mrn) is matched against
single-column UNIQUE columns elsewhere in the source. When a match is
found and no catalog FK exists, the engine emits one implied_fk_warning
naming both columns and suggesting a seed_alias:
Warning: [implied_fk_warning] Table clinical.patient_documents column
referring_provider_email looks like a soft reference to
clinical.providers.email (UNIQUE) but no catalog foreign key exists. Add
'seed_alias: clinical.providers.email' on
clinical.patient_documents.referring_provider_email to keep masked values
consistent across both columns.
The warning never blocks the run. Apply the fix by giving both the referenced
column and the soft reference a shared seed_alias (see above). To silence a
warning you have reviewed and accepted, list its source column path under
implied_fk_ignore:
implied_fk_ignore:
- clinical.patient_documents.referring_provider_email
Running a mask job¶
privaci run (or bare privaci) loads this file, resolves secrets, runs
pre-flight checks, then streams masked rows to the target:
export SOURCE_DB_URL=postgresql://postgres:dev@127.0.0.1:55432/privaci_source
export TARGET_DB_URL=postgresql://postgres:dev@127.0.0.1:55433/privaci_target
export ANONYMIZATION_SALT="$(privaci gen-salt)"
privaci run --config mask-rules.yaml
privaci dry-run --config mask-rules.yaml # pre-flight only (no target wipe)
| Flag | Description |
|---|---|
--config |
Path to this file (default: /config/mask-rules.yaml). |
--source |
Source DB URL or secret URI (default: SOURCE_DB_URL). |
--target |
Target DB URL or secret URI (default: TARGET_DB_URL). |
--dry-run |
On run only: pre-flight + per-table summary, no writes. |
--no-audit-table |
Skip _privaci.audit_log writes for this run. |
Salt resolution: global_salt in this file (supports ${ANONYMIZATION_SALT}
and secret URIs), else the ANONYMIZATION_SALT environment variable.
Secret URIs (aws-sm://, azure-kv://, vault://, env://, file://) use
bounded connect/read timeouts and fail closed on backend errors. The file://
backend only reads regular files under allowed roots (default:
/run/secrets, /var/run/secrets); override with a :-separated list in
PRIVACI_SECRET_FILE_ROOTS. Symlinks and paths outside those roots are rejected.
Verifying a run¶
privaci verify audits a completed run by comparing the target against the
source. It is value-free: it reports only counts, rates, and verdicts —
never raw cell values — so it is safe to run in any environment, including CI.
privaci verify --config mask-rules.yaml
# Verification: 77 passed, 0 warning(s), 0 failure(s).
It samples rows (default 1,000 per table, --sample-size) and runs these checks:
| Check | Verdict on failure | Catches |
|---|---|---|
column.change_rate |
FAIL if 0% changed, WARN if partial | A masked column whose values didn't change (mask not applied) |
column.passthrough_drift |
FAIL | A passthrough column that changed unexpectedly |
table.row_count |
FAIL | Rows dropped or duplicated vs source |
table.uniqueness |
FAIL | Faker collisions breaking a PK/UNIQUE constraint |
table.fk_integrity |
FAIL | Orphaned foreign keys after masking |
verify exits 1 when any check FAILs (warnings do not fail the command), so
it can gate a pipeline. Tables without a single-column primary key skip the
row-level checks (reported as a WARN) but still get structural checks.
| Flag | Description |
|---|---|
--config |
Path to this file (default: /config/mask-rules.yaml). |
--source / --target |
DB URLs or secret URIs (default: SOURCE_DB_URL / TARGET_DB_URL). |
--sample-size |
Rows per table to sample for row-level checks (default: 1000). |
Resuming an interrupted run¶
After a SIGINT/SIGTERM (exit 130), checkpoints in _privaci.table_checkpoints
record per-table progress. Resume only when config, source URL, and salt are
unchanged:
privaci resume --config mask-rules.yaml
If identity fields drift, resume exits 2 with a structured message. Tables
already marked done in checkpoints are skipped; in-progress tables continue
from last_pk_value.
Generating CI workflows¶
privaci generate-ci --platform github-actions
# Writes .github/workflows/privaci-refresh.yml and docs/privaci-setup.md
privaci generate-ci --platform gitlab-ci
privaci generate-ci --platform k8s-cronjob
Use --output-dir to write files somewhere other than the current directory.
Installing a config pack¶
privaci install-pack hipaa --config mask-rules.yaml
# Preview merge, prompt for confirmation
privaci install-pack hipaa --config mask-rules.yaml --yes
privaci install-pack hipaa --local-pack-dir ./packs --config mask-rules.yaml
Manifests are verified against a trusted Ed25519 public key before any local
file is modified. Invalid signatures exit 3.
The engine ships no built-in key. You must provide the official release key
through the PRIVACI_PACK_PUBLIC_KEY environment variable (a hex-encoded 32-byte
Ed25519 public key). When it is unset, verification fails closed and
install-pack aborts before touching your config:
# Official trust anchor (v0.1.0-beta.1 and later, until rotated):
export PRIVACI_PACK_PUBLIC_KEY="cd965cb6dadcecefd508ae84a000684f431490c3d3ddae006ad5f89bf2c25978"
privaci install-pack hipaa --config mask-rules.yaml
| Situation | Result |
|---|---|
PRIVACI_PACK_PUBLIC_KEY unset or invalid |
Exit 3, no files modified |
| Key set, signature invalid/tampered | Exit 3, no files modified |
| Key set, signature valid | Merge proceeds after preview/confirmation |
The release pipeline provisions this key; operators installing official packs copy it from the published release notes. See the pack-signing runbook for how the key is generated and rotated.
Compliance reports¶
privaci report --run <run-uuid> --format json
privaci report --run <run-uuid> --format json --output report.json
Community mode emits a JSON stub; signed PDF reports require the commercial layer.
Validation rules at a glance¶
All violations below exit 3 and name the YAML path:
- Unknown top-level or per-action keys (
extra = forbid). - Missing or unsupported
version. - Misspelled
action(lists the valid action tags). regex_maskwith a non-compilablepatternor an unknown flag.on_existing_data: append(unsupported in the MVP).batch_size < 1(global or per-table).action: ai_refinewithout the commercial layer installed.action: nullon aNOT NULLcolumn (checked during pre-flight against the live catalog).- Configured table names absent from the source catalog (pre-flight exit
3).
Related¶
- CLI reference — every
privacisubcommand and its options. - Error codes — exit codes and message format.
- Extending PrivaCI — the commercial plugin model that
unlocks
ai_refine.