Agents Playbook
Pillars/Quality

Performance Budgets Pattern

How to keep 'it's fast enough' from drifting into 'why is it slow?'

Performance Budgets Pattern

How to keep "it's fast enough" from drifting into "why is it slow?"

TL;DR (human)

Performance is a budget, not an afterthought. Three classes of budget: bundle (bytes shipped), latency (p50 / p95 / p99 per surface), resource (queries, allocations, cache hits). Each has a target + a regression gate. Performance work happens on the SLOs that move the needle, not on micro-optimisations that flatter benchmarks.

For agents

Three budget classes

ClassExamplesWhere measured
Bundle sizeJS bundle per route, total page weight, image weightBuild time
Latencyp50 / p95 / p99 for HTTP, RPC, DB queriesProduction (per observability-pattern.md)
ResourceQueries per request, allocations per request, cache hit rateProduction + load tests

Each has a target. Each has a regression detector.

Bundle size budgets

For web apps, per-route budgets:

  • JS (gzipped): home page ≤ 100 KB; authenticated app shell ≤ 300 KB; per-route lazy chunks ≤ 50 KB.
  • CSS (gzipped): per page ≤ 30 KB.
  • Images: hero images ≤ 100 KB; thumbnails ≤ 10 KB; consider WebP / AVIF.
  • Fonts: subset; max 2 weights × 1 family; preload critical.
  • Total page weight: ≤ 1 MB above the fold.

Gates:

  • Per-route bundle size measured at build (size-limit, bundlewatch, framework-native budgets).
  • CI fails if any route exceeds budget.
  • Shrink-only baseline for established codebases.

Recipe: route-level code splitting; dynamic imports for non-critical features; tree-shaking; dead-code elimination.

Latency budgets

Per user-facing surface:

Surfacep95 latency budget
Auth (login flow)< 500 ms
Dashboard initial load< 1 s
Standard list-fetch< 300 ms
Write (form submit)< 500 ms
Search (interactive)< 200 ms
Background dispatch (start a job)< 1 s
Long-running job (the dispatch, not the work)< 200 ms

Budgets vary by product; calibrate based on user research + competitor benchmarks.

Per-tier breakdown (the budget allocated across layers):

Total p95 1000ms budget
├── DNS + TLS + connection:    100 ms (CDN / edge)
├── Server processing:         400 ms (handler + queries)
├── Response payload + transit:200 ms (size + network)
└── Browser parse + render:    300 ms (HTML, JS, CSS, paint)

Budgets at each layer compose. Blowing the budget at one layer requires shrinking another.

Resource budgets

Per request:

  • DB queries: ≤ 10 per request (N+1 detection: > 20 queries = probable N+1).
  • DB query time: ≤ 100 ms aggregate per request.
  • Cache hit rate (when caching is in play): > 80% in steady state.
  • Allocations: track if memory pressure; flag specific endpoints with high allocation rate.

Per worker / job:

  • Memory: < 75% of provisioned limit in steady state (room for spikes).
  • CPU: < 70% in steady state.

Where the budget enforcement lives

Build time: bundle-size gate. Hard fail.

CI integration test: query-count gate per endpoint. Synthetic load tests on staging produce p95 measurements. Fail PR if a measured endpoint regressed > 10%.

Production observability: SLO burn rate on latency budgets (per observability-pattern.md). Alert when budget burns faster than expected.

Anti-patterns to detect

PatternSignal
N+1 queryPer-request query count linearly tied to result count
Synchronous fanout to N servicesp95 increases with N
Hot loop with allocationGC pressure spikes per request
Unbounded result setLatency increases over time as data grows
Missing indexDB CPU climbs; specific query slow
Synchronous external callTail latency dominated by third-party
Render-blocking JSFirst Contentful Paint > 2s
Large image not lazy-loadedAbove-fold image stalls render

Each has a recipe to fix; agents can match symptom to recipe.

Performance work prioritisation

Not all performance issues are worth fixing. Prioritise by:

  1. User impact: how many users hit it; how often?
  2. Budget burn: is the SLO at risk?
  3. Cost: is the slow path also expensive (queries, compute)?

Anti-prioritisation: optimising a path that runs once per week to save 5 ms is noise. Optimising the dashboard load that every user hits 100×/day is high-value.

Load testing

Synthetic load tests, run periodically:

  • Soak tests: steady load for hours; surfaces memory leaks, connection-pool exhaustion, cache eviction.
  • Spike tests: sudden 10× load; surfaces rate-limit gaps, queue-depth blow-ups.
  • Ramp tests: gradual climb; surfaces the point where p95 explodes.

Tools: k6, Vegeta, Locust, Gatling.

Load tests run against staging with production-like data shapes. CI-integrated tests for critical paths; longer tests pre-release.

Real-user monitoring (RUM)

Production gives the truth synthetic tests cannot:

  • Per-user p50/p95/p99 latency.
  • Geographic breakdown.
  • Device breakdown (mobile / desktop / connection class).
  • Per-route Core Web Vitals (LCP, INP, CLS).

RUM data feeds SLO calculation. Synthetic load tests catch what RUM will reveal; RUM catches what synthetic missed.

Performance as a feature

Communicating performance to users:

  • Optimistic UI: render the new state immediately; reconcile after.
  • Skeleton loading: shows structure within 100ms (per ../ui-ux/universal.md Rule 4).
  • Streaming results: don't wait for the full payload to render.
  • Background work + progress: tell the user it is happening; estimate completion.

Perceived performance > measured performance. A 2-second operation that feels instant beats a 1-second operation that feels slow.

Common failure modes

  • No budget at all. "It's fast enough." Until it isn't. → Document budgets; gate regressions.
  • Budgets that no one reviews. Budget creeps; nobody notices. → Budget review at release time.
  • Micro-optimisations that don't move the needle. Optimised a 5 ms function nobody hits. → Measure user-perceived; prioritise by impact.
  • Bundle-size gate without per-route detail. Total goes up by 2 KB; you don't know which route. → Per-route budgets.
  • Synthetic-only measurement. Tests say fast; users say slow. → RUM mandatory; sample real users.
  • Performance work that breaks tests. Speed at expense of correctness. → Performance budget is part of the contract; correctness is not negotiable.

Tooling stack (typical)

ConcernTool
Bundle analyzerwebpack-bundle-analyzer, source-map-explorer, next bundle
Bundle gatesize-limit, bundlewatch, framework-native
Synthetic loadk6, Vegeta, Locust
RUMSentry, Datadog RUM, NewRelic Browser, CrUX
Profiling (server)clinic.js, perf, py-spy, async-profiler (Java)
Profiling (web)Chrome DevTools Performance, React Profiler
Core Web VitalsLighthouse CI, web-vitals lib

See also