Design the abstraction layer between a model and your product — tools, file systems, artifacts, skills — so complex multi-step work feels natural to the model and stays reliable for users.

Tool & Capability Design Pattern

Design the abstraction layer between a model and your product — tools, file systems, artifacts, skills — so complex multi-step work feels natural to the model and stays reliable for users.

TL;DR (human)

Tools are the model's API to your product, and the model is a user of that API with peculiar ergonomics: it reads the description as the spec, can't see your code, and fails differently than humans. Design tools at the altitude of user intent ("schedule a meeting"), not raw endpoints; make descriptions and schemas self-explanatory; constrain inputs so whole error classes are impossible; return results the model can act on (including good errors); keep the exposed toolset small and gated to the task; and treat skills/artifacts/file-systems as first-class abstractions that turn multi-step workflows into something the model can drive reliably. The tool surface is product design for a non-human user — and it determines reliability as much as the prompt.

For agents

A tool is an API whose user is a model

The model can't read your source. The tool name, description, and schema are the entire contract — they are prompt-engineering surface, not just type plumbing. A vague description or an ambiguous parameter is a bug that shows up as "the model keeps calling it wrong." Write tool specs the way you'd write docs for an external developer who only has the function signature — because that's exactly the situation.

Design at the altitude of intent

Wrapping every REST endpoint 1:1 makes the model orchestrate plumbing — chaining six low-level calls, holding state, and failing in the middle. Instead, expose tools at the granularity of what the user wants to accomplish:

scheduleMeeting(attendees, window) — not listCalendars + getFreeBusy + createEvent + sendInvites for the model to assemble.
One tool = one coherent unit of intent the model can reason about atomically.
Push multi-step orchestration into the tool where the steps are deterministic; leave the model to decide which intent, not to hand-assemble every primitive.

Higher altitude = fewer turns, less state for the model to drop, fewer places to fail.

Make the contract self-explanatory and constrained

Names are semantic. searchCustomers not query2. The name alone should imply when to call it.
Descriptions state when to use, when not to, and what comes back. Include the failure shape. Disambiguate from sibling tools ("use this for X; for Y use otherTool").
Schema constrains the input space. Enums over free strings, required fields explicit, value ranges encoded. A constrained schema makes a class of wrong calls impossible rather than merely discouraged — the strongest form of guidance (see ../architecture/contracts-zod-pattern.md).
Validate at the boundary and return a usable error. Errors are part of the model's loop: a good error says what was wrong and how to fix the call, so the model self-corrects instead of looping. A raw stack trace is a dead end (see ../architecture/error-hierarchy.md).

Return results the model can use

Shape output for consumption, not for humans. Structured, concise, relevant. A tool that dumps 10k tokens of raw JSON blows the context budget and buries the signal (see context-management-pattern.md).
Reference large payloads by handle. Write big artifacts to a file/store and return a reference; let the model pull what it needs. Don't pour a 50-page document into the window.
Be honest about partial success. "Created 3 of 5, these 2 failed because…" beats a flat success/error the model can't reason about.

Keep the toolset small and gated

Every tool definition spends context tokens and adds a choice the model can get wrong. Fifty tools in the window is noise, latency, and mis-selection.

Expose only the tools relevant to the current task/state. Gate the set; reveal capabilities as the workflow reaches them.
Prune overlap. Two tools that do almost the same thing guarantee the model sometimes picks the wrong one.
Authorize at call time regardless. The toolset is ergonomics; permission is enforcement — the model offering a tool is never authorization to run it (see ../security/ai-llm-safety-pattern.md).

File systems, artifacts & skills as first-class abstractions

The richest agent products give the model more than function calls — they give it an environment:

A file system / artifact store lets the model produce, revise, and reference durable work products instead of regenerating everything inline. Artifacts are how multi-step output stays coherent across turns and how large results stay out of the context window (handle, not blob).
Skills package a reusable capability — instructions + tools + context for a task class — that the model invokes as a unit. A skill turns "here are 12 primitives, figure it out each time" into "do this known thing well," which is the difference between brittle and reliable on complex workflows.
Design these as the abstraction layer between product and model: the product exposes capabilities (files, artifacts, skills); the model composes them. Get this layer right and complex multi-step work feels natural to the model and reliable to users — get it wrong and the model improvises plumbing and drops state.

This abstraction layer is also exactly what you publish as a machine-readable surface so other agents can discover and drive it — see self-describe-pattern.md.

Tools are versioned, tested, evaluated

A tool's description and schema are behavior — when you change them, re-run the evals (the model may now call it differently). Version tool definitions alongside prompts (prompt-versioning-pattern.md), and add tool-call assertions to the deterministic eval tier (../quality/agent-eval-framework-pattern.md): given input X, the right tool is called with the right args, and destructive tools are not called unbidden.

Common failure modes

1:1 endpoint wrappers. Model orchestrates plumbing, drops state mid-chain. → Design at the altitude of user intent.
Vague descriptions. "The model keeps calling it wrong" — because the spec is the description. → Say when/when-not/returns; disambiguate siblings.
Free-string params that should be enums. Invites invalid calls. → Constrain the schema so wrong calls are impossible.
Raw errors returned to the model. Dead-end loops. → Actionable error: what's wrong + how to fix the call.
Tool dumps huge payloads inline. Blows context, buries signal. → Reference by handle; shape output for consumption.
Fifty tools always exposed. Noise, latency, mis-selection. → Gate to the task; prune overlap.
Tool change shipped without re-eval. Model's calling behavior silently shifts. → Version + tool-call assertions in the suite.
Treating the toolset as authorization. Offered ≠ allowed. → Enforce permission at call time.

Tool & Capability Design Pattern

Tool & Capability Design Pattern

TL;DR (human)

For agents

A tool is an API whose user is a model

Design at the altitude of intent

Make the contract self-explanatory and constrained

Return results the model can use

Keep the toolset small and gated

File systems, artifacts & skills as first-class abstractions

Tools are versioned, tested, evaluated

Common failure modes

See also

On this page