How to build the layer between cloud / infra and product engineers, so product teams ship fast without re-learning infra each time.

Platform Engineering + Internal Developer Platform Pattern

How to build the layer between cloud / infra and product engineers, so product teams ship fast without re-learning infra each time.

TL;DR (human)

Platform engineering productises infrastructure for internal developers. The deliverable is an Internal Developer Platform (IDP) — a curated set of golden paths (templates, paved roads, self-service tools) that let teams ship without becoming infra experts. Tracked metrics: lead time, deploy frequency, change failure rate, time to recover (DORA). The platform team's customer is the product team.

For agents

What an IDP includes

Surface	What it provides
Service templates	`new-service` scaffolds: code, CI, deploy, observability wired
Self-service deploy	"I want to ship this" → one command / PR
Self-service env	"I need staging for my PR" → ephemeral env automatically
Service catalog	"Where does X live?" → searchable inventory (Backstage et al)
Observability defaults	dashboards, alerts, SLOs scaffolded per service
Secrets self-service	"I need a new vault entry" → request + audit
Documentation hub	API specs, ADRs, runbooks indexed
Cost visibility	per-team / per-service spend dashboards

The platform is a product. Product engineers are customers; survey them.

Golden paths

A golden path is the easy, paved way to do a common thing — vs the unpaved way, which is allowed but offers no support.

Examples:

New microservice: scaffold → live in 30 minutes.
Add a new background job: existing job-runner abstraction.
Add a new RPC method: existing dispatcher + schema package.
Add a new database: managed RDS via IaC.

Going off-road is allowed. Going off-road for everything = the platform isn't paving real needs.

Self-service ≠ unsupervised

Self-service means a product engineer can do it without filing a ticket. It doesn't mean unreviewed:

Pre-flight checks (cost, security review, capacity).
Audit trail of who provisioned what.
Default-secure choices baked in.
Auto-rollback on health-check failure.

Self-service + guardrails = velocity + safety. Self-service without guardrails = chaos. Guardrails without self-service = bottleneck.

DORA metrics

Per-team or per-service measurement (DevOps Research and Assessment):

Metric	Definition	Elite team target
Deploy frequency	How often code reaches prod	Multiple per day
Lead time for changes	Commit → production	< 1 day
Change failure rate	% of deploys causing incident	0-15%
Time to restore	Incident → resolved	< 1 hour

These metrics drive platform investment. The platform team's goal is to move every product team toward elite.

Service catalog

Per service:

Name + description.
Owning team.
On-call info.
Source repo.
Documentation links (ADRs, RFCs, runbooks).
Dependencies (which services this depends on; which depend on this).
Tier (critical, important, supporting).
Compliance tags.

Tool: Backstage, Cortex, OpsLevel, in-house.

The catalog answers "who do I ask?" + "what depends on this?" + "is this safe to change?".

Templates + scaffolds

A create-\<thing\> CLI / template:

$ npx create-service my-new-service --type=node-ts-api
# scaffolds:
# - source skeleton
# - package.json with workspace conventions
# - Dockerfile + CI workflow
# - terraform module
# - observability defaults
# - README + ADR template

Templates encode conventions (per universal.md, ts-concrete.md) so new services start compliant.

Templates evolve; old services migrate via a separate "update-service" tool.

Ephemeral environments

Per-PR preview environments:

Triggered automatically by PR open.
Live URL posted to PR.
Resources auto-torn-down on PR close (or N days idle).
Cost-attributed to the PR author / team.

Lets reviewers + designers + product see real changes without staging churn.

Capacity + cost guardrails

Self-service is dangerous without guardrails:

Per-team budget caps.
Auto-tear-down of idle resources.
Required tags (per ../quality/cost-optimization-pattern.md).
New service request → reviewed if cost > threshold.

Documentation as a platform feature

A docs portal aggregates:

Per-service READMEs (Backstage-rendered).
API references (OpenAPI / GraphQL schemas / RPC method indexes).
ADRs / RFCs.
Runbooks.
Tutorials / quickstarts.
Search.

Engineers find docs in seconds, not minutes. Search quality matters more than doc volume.

Platform team's customer

Product engineers. Their satisfaction is the platform team's KPI:

Onboarding time: new engineer → first PR shipped.
Time to spin up a new service.
"How happy are you with the platform?" — quarterly survey.
Support tickets: their volume + topics.

Anti-pattern: platform team optimises for its own elegance, ignores adopters' pain.

Inner-source contributions

Product teams contribute back to the platform:

A team builds a niche helper; useful broadly; promote into platform.
A team finds a bug in a template; patches it; gets credit.
Codeowners model: platform team owns merging; community drives PRs.

Healthy platforms grow from the edges, not from the center alone.

Anti-pattern: the gatekeeper platform

Symptoms:

Every product change requires a platform ticket.
"Wait 2 weeks for the platform team to enable this."
Workarounds proliferate.
Product teams build shadow infrastructure.

Cure: more golden paths; more self-service; fewer tickets.

Common failure modes

No platform team; everyone reinvents. Drift; bus factor low. → Form a platform team when the org passes ~30 engineers.
Platform team builds without users. Adoption zero. → Adopt-first design.
Self-service without guardrails. Cost / security incidents. → Guardrails baked in.
Old services don't migrate. Platform forks; legacy slows. → Migration tooling; deprecation.
Templates rot. New service uses old template; conventions wrong. → Templates own conventions; CI verifies.
No service catalog. Org-wide knowledge in heads. → Backstage or equivalent.

Tooling stack (typical)

Concern	Tool
Service catalog	Backstage, Cortex, OpsLevel, Compass
IaC	Terraform, Pulumi, CDK, Crossplane
Templates	cookiecutter, plop, Backstage Software Templates
Ephemeral envs	Vercel preview deploys, Render, Fly.io, custom on k8s
Self-service portals	Backstage, in-house
DORA metrics	Faros, LinearB, in-house from CI + deploy logs

Adoption path

< 30 engineers: no platform team needed; shared playbook.
30-100: first platform engineer; service catalog; templates.
100-300: platform team of 3-8; self-service deploy; DORA tracking.
300+: full IDP; multiple platform sub-teams (compute, data, security, observability).

Don't form a platform team too early; you have nothing to platform yet.

Platform Engineering + Internal Developer Platform Pattern

Platform Engineering + Internal Developer Platform Pattern

TL;DR (human)

For agents

What an IDP includes

Golden paths

Self-service ≠ unsupervised

DORA metrics

Service catalog

Templates + scaffolds

Ephemeral environments

Capacity + cost guardrails

Documentation as a platform feature

Platform team's customer

Inner-source contributions

Anti-pattern: the gatekeeper platform

Common failure modes

Tooling stack (typical)

Adoption path

See also

On this page