Cost Optimization Pattern (FinOps)
How to control cloud spend without micromanaging every commit.
Cost Optimization Pattern (FinOps)
How to control cloud spend without micromanaging every commit.
TL;DR (human)
Cloud spend without FinOps doubles every 18 months on autopilot. Discipline: per-team budget; per-tenant attribution; per-workload right-sizing; commitments + spot for predictable load; caching + query budgets; CI runs cost-aware too. The goal is "spend roughly what we said we'd spend" — not minimise at all cost.
For agents
Three FinOps phases (the FinOps Foundation framework)
| Phase | Question | Tools |
|---|---|---|
| Inform | Where is the money going? | Cost dashboards; per-service / per-team / per-tenant attribution |
| Optimize | What can we cut without harm? | Right-sizing; commitments; spot; cache; query reduction |
| Operate | How do we keep it that way? | Budgets; alerts; per-PR cost gates; FinOps rituals |
Most teams skip straight to Optimize; that's wrong. Inform first; without attribution, optimization is guesswork.
Inform — cost attribution
Every dollar should answer:
- Service: which microservice / Lambda / managed service.
- Environment: prod / staging / dev.
- Team: who owns it; who reviews bills.
- Tenant (multi-tenant systems): which customer drives the cost.
- Feature / surface (optional but powerful): which product surface.
Achieved via:
- Cloud tags / labels: applied to every resource at creation time; enforced via IaC (Terraform, Pulumi, CDK).
- Per-request tagging: spans / logs / metrics carry tenant + service tags.
- Cost allocation reports: cloud-native (AWS Cost Explorer, GCP Billing, Azure Cost Management) + per-tenant rollup.
Untagged resources = mystery costs. Hard rule: no untagged resources in production.
Per-tenant attribution
In multi-tenant SaaS, per-tenant cost drives:
- Pricing: usage-based or tier-based pricing depends on cost knowledge.
- Customer success: tenants spending 10× more than they pay are flight risks (acquisition cost will outweigh).
- Capacity planning: who would grow + how much.
- Quota tuning: where to put limits.
Computed from observability tags (per observability-pattern.md). Roll up nightly into a per-tenant cost table.
Right-sizing
Most workloads over-provision. Symptoms:
- CPU steady < 30%.
- Memory steady < 50%.
- Network rarely saturated.
Right-sizing process:
- Measure: 30+ days of utilisation per instance / service.
- Recommend: smaller instance class; lower memory; fewer replicas.
- Stage: change in staging; measure.
- Promote: change in production with rollback path.
Hold a reasonable cushion (50–70% utilisation steady-state; lower for spiky workloads).
Auto-scaling helps but only when:
- Cold-start is acceptable (sub-minute scale-up).
- Stateless workload.
- Predictable load shape.
Commitments + spot
Cloud providers reward predictability:
- Reserved instances / commitments (1y, 3y): 30–70% off list price.
- Spot instances: 60–90% off; reclaimed on short notice.
Mix:
- Steady baseline: covered by commitments (~70% of capacity).
- Burst above baseline: spot or on-demand.
- Stateful / critical: on-demand or reserved; never spot.
Commitments are a forecasting bet. Under-commit and miss the discount; over-commit and pay for unused capacity. Default to under-committing.
Query + storage budgets
In data-heavy systems, database cost often dominates compute.
Per-endpoint discipline (extends performance-budgets-pattern.md):
- Query count budget per request (N+1 detection: > 20 = probable bug).
- Bytes scanned budget per request (avoid full-table scans).
- Result size budget (paginate everything; max 100 rows per page default).
- Cold storage tier for data > 90 days unused.
- Compression: enable everywhere it pays (most columnar stores).
Per-tenant query budgets prevent noisy-neighbor cost spikes:
- Max query CPU-time per tenant per minute.
- Max bytes scanned per tenant per hour.
- Circuit-break at limit; surface as
QUOTA_EXCEEDED(see../security/multi-tenant-isolation-pattern.md).
Egress + data transfer
Often the surprise on cloud bills:
- Cross-region transfer: usually expensive.
- Egress to internet: expensive at scale.
- Cross-AZ transfer: sometimes free, sometimes not.
Mitigations:
- Keep data hot in the same region / AZ as the consumer.
- CDN for public assets (one-time push; cheap edge serving).
- Avoid cross-region replication for non-critical data.
- Compress on the wire.
Background jobs + queues
Cheaper than synchronous:
- Async background jobs scale on cheaper compute (spot OK).
- Queues buffer bursts; smooth provisioning.
- Job retries are cheap when the work is idempotent.
Discipline:
- Idempotency-key on every job (replays don't double-charge).
- Dead-letter queue for permanently failing jobs.
- Visibility into queue depth + worker utilisation.
Cache hit rate
A cache pays for itself when:
- Hit rate > 60% in steady state.
- Origin compute / DB cost per request > cache cost per request.
Measure cache hit rate per cache; track over time. Hit rate drops are signals (key churn, app pattern change, eviction pressure).
See ../architecture/distributed-data-pattern.md for cache tiers.
Dev / CI cost
Often overlooked:
- CI minutes: cache aggressively (per
ci-cd-pipeline-pattern.md). - Per-PR ephemeral environments: convenient but expensive; lifecycle them (auto-tear-down after N days).
- Dev databases: long-running instances; sleep / terminate on no-activity.
- Build artefact storage: tier old artefacts to cold; expire after N versions.
A 10× cost gap exists between "every team has its own everything 24/7" and "shared dev infrastructure with lifecycle policies".
Cost gates
Beyond budgets, per-PR cost signals:
- Bundle size increase → bandwidth + CDN cost.
- New cloud resources in IaC diff → manual review.
- New paid service dependency → PR comment with monthly estimate.
- Performance regression → potentially higher per-request cost.
These extend the gate suite (see quality-gates-pattern.md).
FinOps rituals
| Cadence | Activity |
|---|---|
| Daily | Cost-anomaly alerts trigger on spike |
| Weekly | Top spenders dashboard reviewed |
| Monthly | Per-service / per-team budgets reviewed |
| Quarterly | Commitments + right-sizing review |
| Annually | Cloud-provider contract negotiation; multi-cloud strategy review |
Anomaly detection
A 2× cost spike in 24h means something changed. Possible causes:
- New deploy with a query regression.
- A customer's traffic spike (good or bad).
- A bug producing infinite retries.
- A test inadvertently shipped that hits an expensive path.
- Account compromise (cryptominer; spam).
Anomaly alert routes to the team that owns the service. SEV depends on magnitude:
- 1.5× = warn.
- 3× = SEV-3.
- 10× = SEV-1 (probable runaway or compromise).
Cost-per-X metrics
Useful tracking metrics:
- Cost per active user (DAU / MAU).
- Cost per request.
- Cost per tenant (per pricing-tier).
- Cost per transaction (for products with discrete units of value).
Surface these in dashboards alongside business metrics. Engineering leadership reasons about cost in product terms.
Multi-cloud caveat
Multi-cloud is sometimes proposed for cost. Reality:
- Cross-cloud egress is expensive; data gravity locks you in.
- Operations cost (per-cloud expertise, tooling) often exceeds savings.
- Single-cloud + multi-region usually delivers most of the resilience.
Adopt multi-cloud for sovereignty / vendor risk / specific service needs — not for cost.
Common failure modes
- No tagging. Cost mystery; cannot attribute. → Enforce in IaC.
- Over-provisioning on autopilot. Auto-scaling configured min ≥ peak; never scales down. → Right-size min.
- No budgets. Bills surprise leadership. → Per-team budget; alerts at 50/75/90%.
- Cost work treated as separate from engineering. → Engineers see their cost; act on it.
- Optimisations that break SLOs. Cost down; quality down. → Cost is constrained-by SLO, not above it.
- Spot for stateful. Reclaimed mid-write. → On-demand for state; spot for stateless.
- Cross-region traffic accidental. Service in region A calls DB in region B. → Network policy + alert.
Tooling stack (typical)
| Concern | Tool |
|---|---|
| Cloud-native cost | AWS Cost Explorer + Budgets, GCP Billing, Azure Cost Management |
| Per-resource analysis | Vantage, Infracost, CloudHealth, Spot.io |
| Per-Kubernetes pod cost | Kubecost, OpenCost |
| Anomaly detection | CloudZero, native cloud anomaly alerts |
| Per-tenant attribution | In-house roll-up from observability tags |
| Right-sizing recommendations | AWS Compute Optimizer, GCP Recommender |
| FinOps governance | FinOps Foundation framework |
Adoption path
- Day 0: tag everything; one budget per environment.
- Month 1: per-service attribution; top-spenders dashboard.
- Quarter 1: right-sizing review; first commitments / reservations.
- Quarter 2: per-tenant attribution; cost-per-X dashboards.
- Quarter 3+: PR-level cost signals; cost-aware feature design.
- Mature: FinOps team / role; engineering OKRs include cost.
See also
performance-budgets-pattern.md— performance and cost overlap heavily.observability-pattern.md— tags drive cost attribution.../security/multi-tenant-isolation-pattern.md— per-tenant quotas.../architecture/distributed-data-pattern.md— cache + replica costs.ci-cd-pipeline-pattern.md— CI cost.