Performance Budgets Pattern

How to keep "it's fast enough" from drifting into "why is it slow?"

TL;DR (human)

Performance is a budget, not an afterthought. Three classes of budget: bundle (bytes shipped), latency (p50 / p95 / p99 per surface), resource (queries, allocations, cache hits). Each has a target + a regression gate. Performance work happens on the SLOs that move the needle, not on micro-optimisations that flatter benchmarks.

For agents

Three budget classes

Class	Examples	Where measured
Bundle size	JS bundle per route, total page weight, image weight	Build time
Latency	p50 / p95 / p99 for HTTP, RPC, DB queries	Production (per `observability-pattern.md`)
Resource	Queries per request, allocations per request, cache hit rate	Production + load tests

Each has a target. Each has a regression detector.

Bundle size budgets

For web apps, per-route budgets:

JS (gzipped): home page ≤ 100 KB; authenticated app shell ≤ 300 KB; per-route lazy chunks ≤ 50 KB.
CSS (gzipped): per page ≤ 30 KB.
Images: hero images ≤ 100 KB; thumbnails ≤ 10 KB; consider WebP / AVIF.
Fonts: subset; max 2 weights × 1 family; preload critical.
Total page weight: ≤ 1 MB above the fold.

Gates:

Per-route bundle size measured at build (size-limit, bundlewatch, framework-native budgets).
CI fails if any route exceeds budget.
Shrink-only baseline for established codebases.

Recipe: route-level code splitting; dynamic imports for non-critical features; tree-shaking; dead-code elimination.

Latency budgets

Per user-facing surface:

Surface	p95 latency budget
Auth (login flow)	< 500 ms
Dashboard initial load	< 1 s
Standard list-fetch	< 300 ms
Write (form submit)	< 500 ms
Search (interactive)	< 200 ms
Background dispatch (start a job)	< 1 s
Long-running job (the dispatch, not the work)	< 200 ms

Budgets vary by product; calibrate based on user research + competitor benchmarks.

Per-tier breakdown (the budget allocated across layers):

Total p95 1000ms budget
├── DNS + TLS + connection:    100 ms (CDN / edge)
├── Server processing:         400 ms (handler + queries)
├── Response payload + transit:200 ms (size + network)
└── Browser parse + render:    300 ms (HTML, JS, CSS, paint)

Budgets at each layer compose. Blowing the budget at one layer requires shrinking another.

Resource budgets

Per request:

DB queries: ≤ 10 per request (N+1 detection: > 20 queries = probable N+1).
DB query time: ≤ 100 ms aggregate per request.
Cache hit rate (when caching is in play): > 80% in steady state.
Allocations: track if memory pressure; flag specific endpoints with high allocation rate.

Per worker / job:

Memory: < 75% of provisioned limit in steady state (room for spikes).
CPU: < 70% in steady state.

Where the budget enforcement lives

Build time: bundle-size gate. Hard fail.

CI integration test: query-count gate per endpoint. Synthetic load tests on staging produce p95 measurements. Fail PR if a measured endpoint regressed > 10%.

Production observability: SLO burn rate on latency budgets (per observability-pattern.md). Alert when budget burns faster than expected.

Anti-patterns to detect

Pattern	Signal
N+1 query	Per-request query count linearly tied to result count
Synchronous fanout to N services	p95 increases with N
Hot loop with allocation	GC pressure spikes per request
Unbounded result set	Latency increases over time as data grows
Missing index	DB CPU climbs; specific query slow
Synchronous external call	Tail latency dominated by third-party
Render-blocking JS	First Contentful Paint > 2s
Large image not lazy-loaded	Above-fold image stalls render

Each has a recipe to fix; agents can match symptom to recipe.

Performance work prioritisation

Not all performance issues are worth fixing. Prioritise by:

User impact: how many users hit it; how often?
Budget burn: is the SLO at risk?
Cost: is the slow path also expensive (queries, compute)?

Anti-prioritisation: optimising a path that runs once per week to save 5 ms is noise. Optimising the dashboard load that every user hits 100×/day is high-value.

Load testing

Synthetic load tests, run periodically:

Soak tests: steady load for hours; surfaces memory leaks, connection-pool exhaustion, cache eviction.
Spike tests: sudden 10× load; surfaces rate-limit gaps, queue-depth blow-ups.
Ramp tests: gradual climb; surfaces the point where p95 explodes.

Tools: k6, Vegeta, Locust, Gatling.

Load tests run against staging with production-like data shapes. CI-integrated tests for critical paths; longer tests pre-release.

Real-user monitoring (RUM)

Production gives the truth synthetic tests cannot:

Per-user p50/p95/p99 latency.
Geographic breakdown.
Device breakdown (mobile / desktop / connection class).
Per-route Core Web Vitals (LCP, INP, CLS).

RUM data feeds SLO calculation. Synthetic load tests catch what RUM will reveal; RUM catches what synthetic missed.

Performance as a feature

Communicating performance to users:

Optimistic UI: render the new state immediately; reconcile after.
Skeleton loading: shows structure within 100ms (per ../ui-ux/universal.md Rule 4).
Streaming results: don't wait for the full payload to render.
Background work + progress: tell the user it is happening; estimate completion.

Perceived performance > measured performance. A 2-second operation that feels instant beats a 1-second operation that feels slow.

Common failure modes

No budget at all. "It's fast enough." Until it isn't. → Document budgets; gate regressions.
Budgets that no one reviews. Budget creeps; nobody notices. → Budget review at release time.
Micro-optimisations that don't move the needle. Optimised a 5 ms function nobody hits. → Measure user-perceived; prioritise by impact.
Bundle-size gate without per-route detail. Total goes up by 2 KB; you don't know which route. → Per-route budgets.
Synthetic-only measurement. Tests say fast; users say slow. → RUM mandatory; sample real users.
Performance work that breaks tests. Speed at expense of correctness. → Performance budget is part of the contract; correctness is not negotiable.

Tooling stack (typical)

Concern	Tool
Bundle analyzer	webpack-bundle-analyzer, source-map-explorer, `next bundle`
Bundle gate	size-limit, bundlewatch, framework-native
Synthetic load	k6, Vegeta, Locust
RUM	Sentry, Datadog RUM, NewRelic Browser, CrUX
Profiling (server)	clinic.js, perf, py-spy, async-profiler (Java)
Profiling (web)	Chrome DevTools Performance, React Profiler
Core Web Vitals	Lighthouse CI, web-vitals lib

Performance Budgets Pattern

Performance Budgets Pattern

TL;DR (human)

For agents

Three budget classes

Bundle size budgets

Latency budgets

Resource budgets

Where the budget enforcement lives

Anti-patterns to detect

Performance work prioritisation

Load testing

Real-user monitoring (RUM)

Performance as a feature

Common failure modes

Tooling stack (typical)

See also

On this page