Documentation

Concepts

The mental model behind zod4-mock. Read this once and the API will feel obvious.


World

A is a seeded generation session. It holds the PRNG, the registry, and all schema registrations.

const world = createWorld({ seed: 42 })
  .withSchema(PersonSchema)
  .withSchema(DocumentSchema, { relations: { author: PersonSchema } });

One world = one seed = one deterministic dataset. All schemas registered on a world share the same PRNG state and registry, which is what makes cross-schema consistency possible.

Options

seed number · (required)
Master seed. Same seed → same output.
locale LocaleData · minimal en
Active locale. Defaults to a built-in minimal English locale; import a richer one from @zod4-mock/locale-en / @zod4-mock/locale-nl. See Localization.
optionalProbability number · 0.2
Chance that z.optional() / z.nullable() fields are omitted.
defaultArrayLength [number, number] · [1, 5]
Fallback array length when no .min() / .max() is set.
generators Record<string, KeyGenerator> · {}
Custom key-based generators applied globally.
recursionLimit number · 8
Max depth for self-referential / recursive schemas.

Schemas

Every schema you register with withSchema is tracked by the world. There are three registration modes:

Primary — identity anchor

world.withSchema(PersonSchema);

A primary schema generates independent instances. The world cycles through them deterministically as you call generate(). Instances are stored in the registry and can be referenced by other schemas.

Derived — projection of another schema

world.withSchema(PersonSummarySchema, {
  from: PersonSchema,
  matchers: {
    id: (ctx) => ctx.source.personId,
    name: (ctx) => `${ctx.source.firstName} ${ctx.source.lastName}`,
  },
});

from: binds this schema to a primary schema. Each generated instance of PersonSummarySchema is a projection of the corresponding PersonSchema instance. ctx.source holds the source entity's data.

Relational — linked to other schemas

world.withSchema(DocumentSchema, {
  relations: { author: PersonSchema },
  matchers: {
    authorId: (ctx) => ctx.related("author").personId,
  },
});

relations declares which other schemas this one references. ctx.related("author") resolves to the data of a specific instance of PersonSchema.

All three modes can be combined — a schema can have both from and relations.


The generation pipeline

For every field in a schema, values are resolved in this priority order — the seven named steps of the canonical PIPELINE list in src/pipeline.ts. The first step that produces a value wins:

  1. Eager overridesoptions.overrides primitive/array entries land in ctx.current so sibling matchers can read them via ctx.current.<sibling>.
  2. Matchers — user functions from withSchema({ matchers }). Explicit per-field functions; first to win.
  3. Per-schema key map — entries from withKeyMap({ ... }) matched on the field name.
  4. Unwrap optional — strip optional/nullable/default and roll absent per layer; sets ctx.inner for downstream steps. Internal — does not produce a final value on its own.
  5. World-level custom generators — entries from withGenerators({ ... }) matched on the field name.
  6. Key-based heuristics — built-in DEFAULT_KEY_MAP exact-key + DEFAULT_KEY_PATTERNS regex matches. email → realistic email, firstName → first name, createdAt → date. Full list →
  7. Schema-based fallback — Zod type introspection. z.enum([...]) → random member, z.number().int().min(1).max(100) → integer in range, etc. Always resolves.

After the pipeline

Once the pipeline returns a value for a field, two wrapping passes finish the record:

  • Override deep-mergeoptions.overrides is deep-merged onto the pipeline's value (covers nested-object slices step 0 didn't eagerly consume; B12 contract).
  • Transformoptions.transform is called on the merged value.

You only need to provide matchers for fields the pipeline can't resolve correctly on its own.


The ctx object

Every receives a ctx with:

ctx.gen
Generator library with PRNG pre-applied. ctx.gen.person.firstName(), ctx.gen.internet.email(), ctx.gen.finance.amount(10, 999).
ctx.prng
Raw PRNG for custom ranges. ctx.prng.int(min, max), ctx.prng.pick([...]), ctx.prng.random().
ctx.source
Data of the source schema instance (only when from: is declared).
ctx.related(name)
Resolves and returns the data of a related schema instance.
ctx.registry
Access to all generated data.
ctx.fieldPath
Dot-path of the field being generated, e.g. "address.street".

ctx.gen — generator library

The full generator namespace, with the PRNG already bound. You never pass prng manually:

matchers: {
  name:     (ctx) => ctx.gen.person.fullName(),
  email:    (ctx) => ctx.gen.internet.email(),
  city:     (ctx) => ctx.gen.location.city(),
  iban:     (ctx) => ctx.gen.finance.iban(),
  sentence: (ctx) => ctx.gen.word.sentence(),
}

Generators that take arguments work the same way — the PRNG is the first argument and is applied automatically:

(ctx) => ctx.gen.string.alphanumeric(8)   // length = 8
(ctx) => ctx.gen.finance.amount(10, 999)  // min, max

The registry

Every generated primary schema instance is stored in the . Other matchers can look it up to establish cross-schema consistency.

// Pick a random instance of a registered schema
const person = ctx.registry.pick(PersonSchema);

// Pick all instances
const people = ctx.registry.all(PersonSchema);

// Filter all matching a predicate
const active = ctx.registry.filter(PersonSchema, (p) => p.active);

Registry lookups are typed from the schema — no manual type casts needed.

pick() throws if the registry has no instances of that schema yet. Generate the referenced schema before the one that references it.


Composable nested schemas

Matchers registered for a schema apply automatically wherever that schema appears — including nested inside another schema's fields.

const world = createWorld({ seed: 42 })
  .withSchema(AddressSchema, {
    matchers: {
      street: (ctx) => ctx.gen.location.street(),
      city: (ctx) => ctx.gen.location.city(),
    },
  })
  .withSchema(PersonSchema); // PersonSchema has address: AddressSchema

// PersonSchema's address field uses AddressSchema's matchers automatically
const person = world.generate(PersonSchema);

Determinism

Two guarantees make stable:

Same seed → same output. The PRNG is deterministic (SFC32). Rebuild the world with the same seed and the same builder chain; you get byte-identical data.

Per-field seeding. Each field gets an independent PRNG derived from hash(worldSeed + schemaId + fieldPath). Adding or removing a field from a schema does not disturb the values of other fields. The lastName of instance #1 has the same value before and after you add a middleName field.

This means you can add fields to schemas mid-project without invalidating existing test snapshots.


Localization

A locale decides what data the generators draw from — names, words, currencies, date formats, address shapes, phone formats, and so on. The world carries a single locale; all generators read from it.

zod4-mock ships a built-in minimal English locale that's used when you don't pass locale. It has small curated word/name arrays — enough to be valid, deliberately not realistic. Output looks like "John Smith", "Section", "$128.94".

For realistic output, install a locale package and pass it to createWorld:

import { createWorld } from "zod4-mock";
import { en } from "@zod4-mock/locale-en"; // Markov-trained English
import { nl } from "@zod4-mock/locale-nl"; // Markov-trained Dutch

createWorld({ seed: 42, locale: en });
createWorld({ seed: 42, locale: nl });

A locale is a plain LocaleData object — sections for person, address, commerce, company, word, finance, date, color, phone. Locales can supply either Markov models (firstNamesMale, nounModel) or plain arrays (simpleFirstNamesMale, nouns); generators prefer the model when present.

For variants, use extend() (re-exported from each locale package, e.g. @zod4-mock/locale-en):

import { createWorld } from "zod4-mock";
import { en, extend } from "@zod4-mock/locale-en";

const enGB = extend(en, {
  address: { ...en.address, phonePrefix: "+44", countryCode: "GB", ibanPrefix: "GB" },
  commerce: { ...en.commerce, formatPrice: (n) => `£${n.toFixed(2)}` },
});

See the API reference for the full LocaleData interface.

Zipf-default picks on open corpora

zod4-mock's open-corpus pickers (e.g. person.firstName, person.lastName) draw from frequency-sorted locale arrays via prng.pickZipf(items, s) — a single closed-form inverse-CDF Zipf draw — rather than uniform prng.pick(items). The exponent s is resolved per call site as locale.frequencyExponentOverrides?.[corpus] ?? locale.frequencyExponent ?? 1.0, so shipped locales bias the head of each list toward the real world's frequency curve: "john" shows up far more than "aaden", mirroring SSA / Census distributions.

This is a deliberate divergence from faker, whose default is uniform across each list. If you prefer faker-style uniform output, set frequencyExponent: 0 (or override an individual corpus) on your locale.

Unique contexts auto-flatten to uniform. When you request world.generate(schema, { unique: true }), the engine flattens s to 0 for every pickZipf call inside that loop — uniqueness wins over realism. The flag has no opt-out; matchers that need head-skewed picks inside a unique loop should call ctx.prng.pickZipf(arr, s) directly with an explicit s.

Closed / enumerable corpora (states, months, weekdays, currencies, etc.) ignore the Zipf surface entirely and stay on prng.pick.

Realistic numeric distributions

The same realism axis applies on the numeric side. Money keys (amount, balance, total, revenue, cost, fee, salary, price, …) draw log-uniformmin * Math.pow(max / min, u) — so leading-digit-1 values appear ~30% of the time (Benford's law), matching real-world ledgers instead of faker's flat uniform-over-range. Scale-free measurement keys (fileSize, bytes, views, population, distance) follow the same log-uniform default with Math.round for the integer routes. age is a clipped log-normal centred on μ = ln(36) (US Census median adult), year is an exponential skew toward the present (λ = 0.05), and quantity / count are truncated geometrics with p = 0.5 (modal at the lower bound).

Three semantic-meaningful keys stay bounded-uniform on a pinned default range: rating ([0, 5]), score and percentage ([0, 100]).

Un-keyed auto-flip on z.number(). A plain (un-routed) numeric field auto-flips to log-uniform when all four of these hold: min > 0, log10(max / min) ≥ 3 (≥ 3 orders of magnitude), !schema.isInt, and no .multipleOf. Anything else stays on today's uniform draw. The threshold (3 orders) is deliberately wide enough to catch obvious file-size / view-count cases without misfiring on probabilities (.min(0.01).max(1)) or sub-percent ranges.

Cross-zero or non-positive ranges always fall back to uniform (the log-uniform formula is undefined for min ≤ 0) — zod4-mock does not silently shift your stated bounds with an epsilon. To opt out per-key, use withGenerators (see docs/recipes.md).


Populate

Use populate() to pre-create a fixed number of instances before generation starts. This is useful when you need other schemas to reference a specific number of entities:

const world = createWorld({ seed: 42 })
  .withSchema(PersonSchema)
  .withSchema(DocumentSchema, { relations: { author: PersonSchema } })
  .populate(PersonSchema, 5); // ensure exactly 5 persons exist

const documents = world.generate(z.array(DocumentSchema).min(20));
// All 20 documents reference one of the 5 persons

Optional and nullable fields

optionalProbability (default 0.2) controls how often z.optional() and z.nullable() fields are omitted.

createWorld({ seed: 42, optionalProbability: 0 }); // always present
createWorld({ seed: 42, optionalProbability: 1 }); // always absent

For test assertions on optional fields, either set optionalProbability: 0 or pin the field with overrides.