Agents Playbook
Pillars

Pillar — Quality

How to know the code works without manually reviewing every agent-produced diff.

View raw .md

Pillar — Quality

How to know the code works without manually reviewing every agent-produced diff.

Status

◐ Scoped, not yet detailed.

Scope

ConcernUniversal principleConcrete pattern
Test pyramidUnit > integration > E2E; cover the boundary contracts heavilyVitest unit + Playwright E2E + contract-level params/result parse tests
Coverage target>90% per shipped package, measured against statementsPer-package coverage threshold in CI; per-package, not whole-repo
Mutation testingBeats coverage as a quality signal once unit suite is goodStryker / mutation tool on stable utilities first
Property / fuzzTest the laws code must obey; attack parsers + crypto with hostile bytesGenerated inputs + shrinking; fuzz at the trust boundary
Adversarial bug-huntOne agent says "looks fine"; a refute-then-reproduce loop finds real logic bugsOrthogonal lenses → skeptic refute → failing repro; loop until dry
Fail-loud defaultsA no-op default the test harness always overrides hides missing prod wiringFail loud when unwired, or assert the real binding in an integration test
Hermetic testsComponent-level vitest preferred over live-app E2EReproduce + lock bugs via in-process tests, not Playwright
Verify-first closeBefore reproducing an issue, check if it's already fixedDefault gh issue view \<n\> at session start
File-size gateSee architecture pillarBaseline shrink-only
Lint gatesNo any, no console.log, no default exports, no nested ternaries, no raw HTMLESLint rule pack + per-file overrides
Quality-gates scriptOne pnpm check:quality-gates for fast structural checksParallel: lint + typecheck + secrets + size + intl + tokens
Sanity scriptOne pnpm sanity for cross-cutting rule auditGenerates docs/audit/sanity-report.md; CI fails on regressions
Pre-push hookRuns structural gates + ADR/RFC checks; not full testsHusky pre-push; tests on CI
Concurrency safetyAgents merge PRs against fast-moving mainStash-verify red, rebase, retry; never --theirs/--ours blindly

Non-negotiables

  1. Tests are part of the diff. No "tests next PR".
  2. Coverage is per package, not aggregate. Aggregate hides which package is bad.
  3. Hermetic over E2E for bug repro. Component tests fail in 2s; Playwright fails in 60s and lies more.
  4. Gates produce actionable messages. "Lint failed" is not actionable. "src/x.ts:42 — no any in boundary file; use unknown and parse." is.
  5. Pre-push is the safety net, not the proof. Run check:all before a release.

See also

Documents in this pillar

DocRead when
universal.mdFirst read; the 9 non-negotiables
test-pyramid.mdTest-tier distribution + escalation
quality-gates-pattern.mdStructural gate suite + orchestrator
pre-push-pattern.mdThree-tier hook split
sanity-pattern.mdCross-cutting audit
mutation-testing-pattern.mdBeyond coverage
property-fuzz-testing-pattern.mdTest the laws, not the examples; fuzz the trust boundary
adversarial-bug-hunt-pattern.mdFind real logic bugs: find → refute → reproduce
fail-loud-defaults-pattern.mdNo-op defaults + over-wired test harness = green CI, broken prod
observability-pattern.mdMetrics / logs / traces / SLOs
performance-budgets-pattern.mdBundle / latency / resource budgets
chaos-engineering-pattern.mdControlled fault injection
ci-cd-pipeline-pattern.mdCommit → prod pipeline; caching; deploy patterns; DB migrations
alerting-runbooks-pattern.mdSLO burn-rate alerts; runbook 5-section template; tuning loop
cost-optimization-pattern.mdFinOps; per-tenant attribution; right-sizing; commitments + spot
contract-testing-pattern.mdPact + schema-first; consumer-driven contracts; broker; can-i-deploy
product-analytics-experimentation-pattern.mdEvent tracking; funnels + cohorts; A/B experiments; holdouts
agent-eval-framework-pattern.mdMeasuring AI agent quality: deterministic graders + LLM-as-judge + production monitoring; eval set as a versioned asset