Performance Budgets Pattern
How to keep 'it's fast enough' from drifting into 'why is it slow?'
Performance Budgets Pattern
How to keep "it's fast enough" from drifting into "why is it slow?"
TL;DR (human)
Performance is a budget, not an afterthought. Three classes of budget: bundle (bytes shipped), latency (p50 / p95 / p99 per surface), resource (queries, allocations, cache hits). Each has a target + a regression gate. Performance work happens on the SLOs that move the needle, not on micro-optimisations that flatter benchmarks.
For agents
Three budget classes
| Class | Examples | Where measured |
|---|---|---|
| Bundle size | JS bundle per route, total page weight, image weight | Build time |
| Latency | p50 / p95 / p99 for HTTP, RPC, DB queries | Production (per observability-pattern.md) |
| Resource | Queries per request, allocations per request, cache hit rate | Production + load tests |
Each has a target. Each has a regression detector.
Bundle size budgets
For web apps, per-route budgets:
- JS (gzipped): home page ≤ 100 KB; authenticated app shell ≤ 300 KB; per-route lazy chunks ≤ 50 KB.
- CSS (gzipped): per page ≤ 30 KB.
- Images: hero images ≤ 100 KB; thumbnails ≤ 10 KB; consider WebP / AVIF.
- Fonts: subset; max 2 weights × 1 family; preload critical.
- Total page weight: ≤ 1 MB above the fold.
Gates:
- Per-route bundle size measured at build (
size-limit,bundlewatch, framework-native budgets). - CI fails if any route exceeds budget.
- Shrink-only baseline for established codebases.
Recipe: route-level code splitting; dynamic imports for non-critical features; tree-shaking; dead-code elimination.
Latency budgets
Per user-facing surface:
| Surface | p95 latency budget |
|---|---|
| Auth (login flow) | < 500 ms |
| Dashboard initial load | < 1 s |
| Standard list-fetch | < 300 ms |
| Write (form submit) | < 500 ms |
| Search (interactive) | < 200 ms |
| Background dispatch (start a job) | < 1 s |
| Long-running job (the dispatch, not the work) | < 200 ms |
Budgets vary by product; calibrate based on user research + competitor benchmarks.
Per-tier breakdown (the budget allocated across layers):
Total p95 1000ms budget
├── DNS + TLS + connection: 100 ms (CDN / edge)
├── Server processing: 400 ms (handler + queries)
├── Response payload + transit:200 ms (size + network)
└── Browser parse + render: 300 ms (HTML, JS, CSS, paint)Budgets at each layer compose. Blowing the budget at one layer requires shrinking another.
Resource budgets
Per request:
- DB queries: ≤ 10 per request (N+1 detection: > 20 queries = probable N+1).
- DB query time: ≤ 100 ms aggregate per request.
- Cache hit rate (when caching is in play): > 80% in steady state.
- Allocations: track if memory pressure; flag specific endpoints with high allocation rate.
Per worker / job:
- Memory: < 75% of provisioned limit in steady state (room for spikes).
- CPU: < 70% in steady state.
Where the budget enforcement lives
Build time: bundle-size gate. Hard fail.
CI integration test: query-count gate per endpoint. Synthetic load tests on staging produce p95 measurements. Fail PR if a measured endpoint regressed > 10%.
Production observability: SLO burn rate on latency budgets (per observability-pattern.md). Alert when budget burns faster than expected.
Anti-patterns to detect
| Pattern | Signal |
|---|---|
| N+1 query | Per-request query count linearly tied to result count |
| Synchronous fanout to N services | p95 increases with N |
| Hot loop with allocation | GC pressure spikes per request |
| Unbounded result set | Latency increases over time as data grows |
| Missing index | DB CPU climbs; specific query slow |
| Synchronous external call | Tail latency dominated by third-party |
| Render-blocking JS | First Contentful Paint > 2s |
| Large image not lazy-loaded | Above-fold image stalls render |
Each has a recipe to fix; agents can match symptom to recipe.
Performance work prioritisation
Not all performance issues are worth fixing. Prioritise by:
- User impact: how many users hit it; how often?
- Budget burn: is the SLO at risk?
- Cost: is the slow path also expensive (queries, compute)?
Anti-prioritisation: optimising a path that runs once per week to save 5 ms is noise. Optimising the dashboard load that every user hits 100×/day is high-value.
Load testing
Synthetic load tests, run periodically:
- Soak tests: steady load for hours; surfaces memory leaks, connection-pool exhaustion, cache eviction.
- Spike tests: sudden 10× load; surfaces rate-limit gaps, queue-depth blow-ups.
- Ramp tests: gradual climb; surfaces the point where p95 explodes.
Tools: k6, Vegeta, Locust, Gatling.
Load tests run against staging with production-like data shapes. CI-integrated tests for critical paths; longer tests pre-release.
Real-user monitoring (RUM)
Production gives the truth synthetic tests cannot:
- Per-user p50/p95/p99 latency.
- Geographic breakdown.
- Device breakdown (mobile / desktop / connection class).
- Per-route Core Web Vitals (LCP, INP, CLS).
RUM data feeds SLO calculation. Synthetic load tests catch what RUM will reveal; RUM catches what synthetic missed.
Performance as a feature
Communicating performance to users:
- Optimistic UI: render the new state immediately; reconcile after.
- Skeleton loading: shows structure within 100ms (per
../ui-ux/universal.mdRule 4). - Streaming results: don't wait for the full payload to render.
- Background work + progress: tell the user it is happening; estimate completion.
Perceived performance > measured performance. A 2-second operation that feels instant beats a 1-second operation that feels slow.
Common failure modes
- No budget at all. "It's fast enough." Until it isn't. → Document budgets; gate regressions.
- Budgets that no one reviews. Budget creeps; nobody notices. → Budget review at release time.
- Micro-optimisations that don't move the needle. Optimised a 5 ms function nobody hits. → Measure user-perceived; prioritise by impact.
- Bundle-size gate without per-route detail. Total goes up by 2 KB; you don't know which route. → Per-route budgets.
- Synthetic-only measurement. Tests say fast; users say slow. → RUM mandatory; sample real users.
- Performance work that breaks tests. Speed at expense of correctness. → Performance budget is part of the contract; correctness is not negotiable.
Tooling stack (typical)
| Concern | Tool |
|---|---|
| Bundle analyzer | webpack-bundle-analyzer, source-map-explorer, next bundle |
| Bundle gate | size-limit, bundlewatch, framework-native |
| Synthetic load | k6, Vegeta, Locust |
| RUM | Sentry, Datadog RUM, NewRelic Browser, CrUX |
| Profiling (server) | clinic.js, perf, py-spy, async-profiler (Java) |
| Profiling (web) | Chrome DevTools Performance, React Profiler |
| Core Web Vitals | Lighthouse CI, web-vitals lib |
See also
observability-pattern.md— measurement infrastructure.../architecture/anti-overengineering.md— premature optimisation reminder.../architecture/distributed-data-pattern.md— caching tiers; replica routing.chaos-engineering-pattern.md— load tests + fault injection.