mirror of
https://github.com/anthropics/skills
synced 2026-07-05 03:26:53 +00:00
Compare commits
3 Commits
c30d329f58
...
5754626092
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5754626092 | ||
|
|
2235be7c60 | ||
|
|
5d25128289 |
@@ -2,7 +2,7 @@
|
||||
name: claude-api
|
||||
description: |-
|
||||
Reference for the Claude API / Anthropic SDK — model ids, pricing, params, streaming, tool use, MCP, agents, caching, token counting, model migration.
|
||||
TRIGGER — read BEFORE opening the target file; don't skip because it "looks like a one-liner" — whenever: the prompt names Claude/Anthropic in any form (Claude, Anthropic, Opus, Sonnet, Haiku, `anthropic`, `@anthropic-ai`, `claude-*`, `us.anthropic.*`, `[1m]`); the user asks about an LLM (pricing/model choice/limits/caching) — never answer from memory; OR the task is LLM-shaped with provider unstated (agent/MCP/tool-definition/multi-agent/RAG/LLM-judge/computer-use; generate/summarize/extract/classify/rewrite/converse over NL; debugging refusals/cutoffs/streaming/tool-calls/tokens).
|
||||
TRIGGER — read BEFORE opening the target file; don't skip because it "looks like a one-liner" — whenever: the prompt names Claude/Anthropic in any form (Claude, Anthropic, Fable, Opus, Sonnet, Haiku, `anthropic`, `@anthropic-ai`, `claude-*`, `us.anthropic.*`, `[1m]`); the user asks about an LLM (pricing/model choice/limits/caching) — never answer from memory; OR the task is LLM-shaped with provider unstated (agent/MCP/tool-definition/multi-agent/RAG/LLM-judge/computer-use; generate/summarize/extract/classify/rewrite/converse over NL; debugging refusals/cutoffs/streaming/tool-calls/tokens).
|
||||
SKIP only when another provider is being worked on (overrides all triggers): OpenAI/GPT/Gemini/Llama/Mistral/Cohere/Ollama named in the query; OR `grep -rE 'openai|langchain_openai|google.generativeai|genai|mistralai|cohere|ollama'` over the project hits (run this grep FIRST if no provider named — don't Read the file).
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
@@ -167,17 +167,31 @@ Everything goes through `POST /v1/messages`. Tools and output constraints are fe
|
||||
|
||||
---
|
||||
|
||||
## Current Models (cached: 2026-05-26)
|
||||
## Current Models (cached: 2026-06-04)
|
||||
|
||||
| Model | Model ID | Context | Input $/1M | Output $/1M |
|
||||
| ----------------- | ------------------- | -------------- | ---------- | ----------- |
|
||||
| Claude Fable 5 | `claude-fable-5` | 1M | $10.00 | $50.00 |
|
||||
| Claude Mythos 5 (Project Glasswing only) | `claude-mythos-5` | 1M | $10.00 | $50.00 |
|
||||
| Claude Opus 4.8 | `claude-opus-4-8` | 1M | $5.00 | $25.00 |
|
||||
| Claude Opus 4.7 | `claude-opus-4-7` | 1M | $5.00 | $25.00 |
|
||||
| Claude Opus 4.6 | `claude-opus-4-6` | 1M | $5.00 | $25.00 |
|
||||
| Claude Sonnet 4.6 | `claude-sonnet-4-6` | 1M | $3.00 | $15.00 |
|
||||
| Claude Haiku 4.5 | `claude-haiku-4-5` | 200K | $1.00 | $5.00 |
|
||||
|
||||
**ALWAYS use `claude-opus-4-8` unless the user explicitly names a different model.** This is non-negotiable. Do not use `claude-sonnet-4-6`, `claude-sonnet-4-5`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.
|
||||
**ALWAYS use `claude-opus-4-8` unless the user explicitly names a different model.** This is non-negotiable. Do not use `claude-sonnet-4-6`, `claude-sonnet-4-5`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours. Use `claude-fable-5` only when the user explicitly asks for Claude Fable 5, "fable", or Anthropic's most capable model — it has different API behavior than the Opus family (see below) and pricing that exceeds Opus-tier.
|
||||
|
||||
### Claude Fable 5 (`claude-fable-5`) — most capable widely released model
|
||||
|
||||
Claude Fable 5 is Anthropic's most capable widely released model, for the most demanding reasoning and long-horizon agentic work. **Claude Mythos 5** (`claude-mythos-5`) offers the same capabilities, pricing, and API surface through Project Glasswing (participation is the only way to access it), succeeding the invitation-only Claude Mythos Preview (`claude-mythos-preview`) — everything below applies to both models. 1M context window (the maximum is also the default), 128K max output. Key API differences from Opus-tier — see `shared/model-migration.md` → Migrating to Claude Fable 5 for details:
|
||||
|
||||
- **Thinking is always on** — omit the `thinking` parameter entirely (or send `{type: "adaptive"}`). Any other explicit configuration is rejected: `{type: "disabled"}` and `{type: "enabled", budget_tokens: N}` both return a 400. Control depth with `output_config.effort` (supports `low` through `xhigh` and `max`).
|
||||
- **Protected thinking = the raw chain of thought, not the summary** — responses carry regular `thinking` blocks (not `redacted_thinking`): `display: "summarized"` returns a readable summary, `"omitted"` (the default) leaves the `thinking` field as an empty string; the raw chain of thought is never exposed on any model. Replay rules: pass thinking blocks back exactly as received on the same model (including empty-text blocks — the API rejects *modified* blocks, not read ones); a **different** model **silently ignores** them (not an error), but ignored blocks still bill input tokens — strip them when switching models for good.
|
||||
- **New tokenizer** — the same content tokenizes to roughly 30% more tokens than on Opus-tier models. Don't reuse token counts or `max_tokens` settings measured on other models; re-baseline with `count_tokens`.
|
||||
- **`refusal` stop reason** — safety classifiers may decline a request (HTTP 200, `stop_reason: "refusal"`, with a `stop_details` category). A pre-output refusal has an empty `content` array and is not billed at all; a mid-stream refusal bills the already-streamed output — discard the partial output. Always check `stop_reason` before reading `content`. To retry on another model: the beta `fallbacks` parameter (Claude API and Claude Platform on AWS) retries server-side in one round trip; the GA SDKs' `BetaRefusalFallbackMiddleware` + `BetaFallbackState` handle client-side retry everywhere else (incl. Bedrock/Vertex); fallback credit refunds the cache-switch cost of client-side retries. See the migration guide's refusal section.
|
||||
- **No assistant prefill** — same as the rest of the 4.6+ family.
|
||||
- **30-day data retention required** — Claude Fable 5 is not available under zero data retention; requests from an org whose retention configuration doesn't meet the requirement return `400 invalid_request_error`.
|
||||
- **Longer turns, different prompting** — single requests on hard tasks can run many minutes (plan timeouts/streaming/progress UX); effort sweeps should include low/medium for routine work; prompts written for prior models are often too prescriptive and reduce output quality. See `shared/model-migration.md` → Migrating to Claude Fable 5 → Behavioral shifts (prompt-tunable) for the recommended prompt snippets (anti-overplanning, no-tidying, grounded progress claims, boundaries, async sub-agents, memory, `send_to_user`).
|
||||
|
||||
**CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-6`, never `claude-sonnet-4-6-20251114` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read `shared/models.md` for the exact ID — do not construct one yourself.
|
||||
|
||||
@@ -189,13 +203,13 @@ A note: if any of the model strings above look unfamiliar to you, that's to be e
|
||||
|
||||
## Thinking & Effort (Quick Reference)
|
||||
|
||||
**Opus 4.8 / 4.7 — Adaptive thinking only:** Use `thinking: {type: "adaptive"}`. `thinking: {type: "enabled", budget_tokens: N}` returns a 400 — adaptive is the only on-mode. `{type: "disabled"}` and omitting `thinking` both work. Sampling parameters (`temperature`, `top_p`, `top_k`) are also removed and will 400. Opus 4.8 keeps the same request surface as 4.7 (no new breaking changes) — see `shared/model-migration.md` → Migrating to Opus 4.8 for the behavioral re-tuning, and → Migrating to Opus 4.7 for the full breaking-change list when coming from 4.6 or earlier. Note: with `thinking` disabled, Opus 4.8 may write longer reasoning into the visible response — leave adaptive thinking on, or add a final-answer-only instruction (see the migration guide).
|
||||
**Opus 4.6 — Adaptive thinking (recommended):** Use `thinking: {type: "adaptive"}`. Claude dynamically decides when and how much to think. No `budget_tokens` needed — `budget_tokens` is deprecated on Opus 4.6 and Sonnet 4.6 and should not be used for new code. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). **When the user asks for "extended thinking", a "thinking budget", or `budget_tokens`: always use Opus 4.8, 4.7, or 4.6 with `thinking: {type: "adaptive"}`. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use `budget_tokens` for new 4.6/4.7/4.8 code and do NOT switch to an older model.** *Gradual-migration carve-out:* `budget_tokens` is still functional on Opus 4.6 and Sonnet 4.6 as a transitional escape hatch — if you're migrating existing code and need a hard token ceiling before you've tuned `effort`, see `shared/model-migration.md` → Transitional escape hatch. Note: this carve-out does **not** apply to Opus 4.7 or 4.8 — `budget_tokens` is fully removed there.
|
||||
**Effort parameter (GA, no beta header):** Controls thinking depth and overall token spend via `output_config: {effort: "low"|"medium"|"high"|"max"}` (inside `output_config`, not top-level). Default is `high` (equivalent to omitting it). `max` is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Opus 4.7 added `"xhigh"` (between `high` and `max`) — the best setting for most coding and agentic use cases on Opus 4.7/4.8, and the default in Claude Code; use a minimum of `high` for most intelligence-sensitive work. Works on Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. On Opus 4.7 and 4.8, effort matters more than on any prior Opus — re-tune it when migrating, and run long-horizon/agentic tasks at `high`/`xhigh` with the full task spec given up front. Combine with adaptive thinking for the best cost-quality tradeoffs. Lower effort means fewer and more-consolidated tool calls, less preamble, and terser confirmations — `high` is often the sweet spot balancing quality and token efficiency; use `max` when correctness matters more than cost; use `low` for subagents or simple tasks.
|
||||
**Fable 5 / Opus 4.8 / 4.7 — Adaptive thinking only:** Use `thinking: {type: "adaptive"}`. `thinking: {type: "enabled", budget_tokens: N}` returns a 400 — adaptive is the only on-mode. On Opus 4.8 and 4.7, `{type: "disabled"}` and omitting `thinking` both work; on Fable 5, an explicit `{type: "disabled"}` returns a 400 — omit the `thinking` param entirely instead. Sampling parameters (`temperature`, `top_p`, `top_k`) are also removed and will 400. Opus 4.8 keeps the same request surface as 4.7 (no new breaking changes) — see `shared/model-migration.md` → Migrating to Opus 4.8 for the behavioral re-tuning, and → Migrating to Opus 4.7 for the full breaking-change list when coming from 4.6 or earlier. Note: with `thinking` disabled, Opus 4.8 may write longer reasoning into the visible response — leave adaptive thinking on, or add a final-answer-only instruction (see the migration guide).
|
||||
**Opus 4.6 — Adaptive thinking (recommended):** Use `thinking: {type: "adaptive"}`. Claude dynamically decides when and how much to think. No `budget_tokens` needed — `budget_tokens` is deprecated on Opus 4.6 and Sonnet 4.6 and should not be used for new code. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). **When the user asks for "extended thinking", a "thinking budget", or `budget_tokens`: always use Fable 5, Opus 4.8, 4.7, or 4.6 with `thinking: {type: "adaptive"}`. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use `budget_tokens` for new 4.6/4.7/4.8 code and do NOT switch to an older model.** *Gradual-migration carve-out:* `budget_tokens` is still functional on Opus 4.6 and Sonnet 4.6 as a transitional escape hatch — if you're migrating existing code and need a hard token ceiling before you've tuned `effort`, see `shared/model-migration.md` → Transitional escape hatch. Note: this carve-out does **not** apply to Fable 5, Opus 4.7 or 4.8 — `budget_tokens` is fully removed there.
|
||||
**Effort parameter (GA, no beta header):** Controls thinking depth and overall token spend via `output_config: {effort: "low"|"medium"|"high"|"max"}` (inside `output_config`, not top-level). Default is `high` (equivalent to omitting it). `max` is supported on Fable 5, Opus 4.6 and later, and Sonnet 4.6 (not Haiku or earlier Sonnets). Opus 4.7 added `"xhigh"` (between `high` and `max`) — the best setting for most coding and agentic use cases on Fable 5 / Opus 4.7/4.8, and the default in Claude Code; use a minimum of `high` for most intelligence-sensitive work. Works on Fable 5, Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. On Fable 5, Opus 4.7 and 4.8, effort matters more than on any prior Opus — re-tune it when migrating, and run long-horizon/agentic tasks at `high`/`xhigh` with the full task spec given up front. Combine with adaptive thinking for the best cost-quality tradeoffs. Lower effort means fewer and more-consolidated tool calls, less preamble, and terser confirmations — `high` is often the sweet spot balancing quality and token efficiency; use `max` when correctness matters more than cost; use `low` for subagents or simple tasks.
|
||||
|
||||
**Opus 4.8 / 4.7 — thinking content omitted by default:** `thinking` blocks still stream but their text is empty unless you opt in with `thinking: {type: "adaptive", display: "summarized"}` (default is `"omitted"`). Silent change — no error. If you stream reasoning to users, the default looks like a long pause before output; set `"summarized"` to restore visible progress.
|
||||
**Thinking display — `"omitted"` by default on Fable 5 / Mythos 5 / Opus 4.8 / 4.7:** `display: "summarized"` returns a readable summary of the reasoning; `"omitted"` (the default on all four — a silent change from Opus 4.6, where it was `"summarized"`) streams `thinking` blocks with empty text. `display` controls visibility only — thinking happens and is billed the same under every setting; the raw chain of thought is never exposed on any model. If you stream reasoning to users, the default looks like a long pause before output — set `thinking: {type: "adaptive", display: "summarized"}` explicitly. (Independent of display, echo thinking blocks back unchanged when continuing on the same model; other models silently ignore them — see the migration guide.)
|
||||
|
||||
**Task Budgets (beta, Opus 4.7 / 4.8):** `output_config: {task_budget: {type: "tokens", total: N}}` tells the model how many tokens it has for a full agentic loop — it sees a running countdown and self-moderates (minimum 20,000; beta header `task-budgets-2026-03-13`). Distinct from `max_tokens`, which is an enforced per-response ceiling the model is not aware of. See `shared/model-migration.md` → Task Budgets.
|
||||
**Task Budgets (beta, Fable 5 / Opus 4.7 / 4.8):** `output_config: {task_budget: {type: "tokens", total: N}}` tells the model how many tokens it has for a full agentic loop — it sees a running countdown and self-moderates (minimum 20,000; beta header `task-budgets-2026-03-13`). Distinct from `max_tokens`, which is an enforced per-response ceiling the model is not aware of. See `shared/model-migration.md` → Task Budgets.
|
||||
|
||||
**Sonnet 4.6:** Supports adaptive thinking (`thinking: {type: "adaptive"}`). `budget_tokens` is deprecated on Sonnet 4.6 — use adaptive thinking instead.
|
||||
|
||||
@@ -205,7 +219,7 @@ A note: if any of the model strings above look unfamiliar to you, that's to be e
|
||||
|
||||
## Compaction (Quick Reference)
|
||||
|
||||
**Beta, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** For long-running conversations that may exceed the 1M context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header `compact-2026-01-12`.
|
||||
**Beta, Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** For long-running conversations that may exceed the 1M context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header `compact-2026-01-12`.
|
||||
|
||||
**Critical:** Append `response.content` (not just the text) back to your messages on every turn. Compaction blocks in the response must be preserved — the API uses them to replace the compacted history on the next request. Extracting only the text string and appending that will silently lose the compaction state.
|
||||
|
||||
@@ -235,7 +249,7 @@ For placement patterns, architectural guidance, and the silent-invalidator audit
|
||||
|
||||
**Mandatory flow:** Agent (once) → Session (every run). `model`/`system`/`tools` live on the agent, never the session. See `shared/managed-agents-overview.md` for the full reading guide, beta headers, and pitfalls.
|
||||
|
||||
**Beta headers:** `managed-agents-2026-04-01` — the SDK sets this automatically for all `client.beta.{agents,environments,sessions,vaults,memory_stores}.*` calls. Skills API uses `skills-2025-10-02` and Files API uses `files-api-2025-04-14`, but you don't need to explicitly pass those in for endpoints other than `/v1/skills` and `/v1/files`.
|
||||
**Beta headers:** `managed-agents-2026-04-01` — the SDK sets this automatically for all `client.beta.{agents,environments,sessions,vaults,memory_stores,deployments,deployment_runs}.*` calls. Skills API uses `skills-2025-10-02` and Files API uses `files-api-2025-04-14`, but you don't need to explicitly pass those in for endpoints other than `/v1/skills` and `/v1/files`.
|
||||
|
||||
**Subcommands** — invoke directly with `/claude-api <subcommand>`:
|
||||
|
||||
@@ -243,12 +257,15 @@ For placement patterns, architectural guidance, and the silent-invalidator audit
|
||||
|---|---|
|
||||
| `managed-agents-onboard` | Walk the user through setting up a Managed Agent from scratch. **Read `shared/managed-agents-onboarding.md` immediately** and follow its interview script: mental model → know-or-explore branch → template config → session setup → **pre-flight viability check** → emit code. The viability check (reconcile the stated job against configured tools/credentials/data) catches under-resourced setups — missing a tool, credential, or data access — before the agent burns budget. Do not summarize — run the interview. |
|
||||
|
||||
**Reading guide:** Start with `shared/managed-agents-overview.md`, then the topical `shared/managed-agents-*.md` files (core, environments, tools, events, outcomes, multiagent, webhooks, memory, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.
|
||||
**Reading guide:** Start with `shared/managed-agents-overview.md`, then the topical `shared/managed-agents-*.md` files (core, environments, tools, events, outcomes, multiagent, webhooks, memory, scheduled-deployments, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.
|
||||
|
||||
**When the user wants to set up a Managed Agent from scratch** (e.g. "how do I get started", "walk me through creating one", "set up a new agent"): read `shared/managed-agents-onboarding.md` and run its interview — same flow as the `managed-agents-onboard` subcommand.
|
||||
|
||||
**When the user asks "how do I write the client code for X":** reach for `shared/managed-agents-client-patterns.md` — covers lossless stream reconnect, `processed_at` queued/processed gate, interrupt, `tool_confirmation` round-trip, the correct idle/terminated break gate, post-idle status race, stream-first ordering, file-mount gotchas, keeping credentials host-side via custom tools, etc.
|
||||
|
||||
**When the user wants the agent to run on a schedule** (cron, "every night", "weekly report"): read `shared/managed-agents-scheduled-deployments.md` — deployments fire sessions autonomously on a cron cadence, with run records, retries, and auto-pause.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Reading Guide
|
||||
@@ -265,8 +282,10 @@ After detecting the language, read the relevant files based on what the user nee
|
||||
|
||||
**Long-running conversations (may exceed context window):**
|
||||
→ Read `{lang}/claude-api/README.md` — see Compaction section
|
||||
**Migrating to a newer model (Opus 4.8 / Opus 4.7 / Opus 4.6 / Sonnet 4.6) or replacing a retired model:**
|
||||
**Migrating to a newer model (Fable 5 / Opus 4.8 / Opus 4.7 / Opus 4.6 / Sonnet 4.6) or replacing a retired model:**
|
||||
→ Read `shared/model-migration.md`
|
||||
**Prompting or tuning Fable 5 (long turns, effort, verbosity, autonomous runs, sub-agents):**
|
||||
→ Read `shared/model-migration.md` → Migrating to Fable 5 → Behavioral shifts (prompt-tunable) + Long-running agent recommendations
|
||||
**Prompt caching / optimize caching / "why is my cache hit rate low":**
|
||||
→ Read `shared/prompt-caching.md` + `{lang}/claude-api/README.md` (Prompt Caching section)
|
||||
**Count tokens in a file / prompt / diff ("how many tokens is X"):**
|
||||
@@ -322,13 +341,15 @@ Live documentation URLs are in `shared/live-sources.md`.
|
||||
## Common Pitfalls
|
||||
|
||||
- Don't truncate inputs when passing files or content to the API. If the content is too long to fit in the context window, notify the user and discuss options (chunking, summarization, etc.) rather than silently truncating.
|
||||
- **Opus 4.8 / 4.7 thinking:** Adaptive only. `thinking: {type: "enabled", budget_tokens: N}` returns 400 — `budget_tokens` is fully removed (along with `temperature`, `top_p`, `top_k`). Use `thinking: {type: "adaptive"}`. Opus 4.8 inherits this surface from 4.7 with no new breaking changes.
|
||||
- **Opus 4.6 / Sonnet 4.6 thinking:** Use `thinking: {type: "adaptive"}` — do NOT use `budget_tokens` for new 4.6 code (deprecated on both Opus 4.6 and Sonnet 4.6; for gradual migration of existing code, see the transitional escape hatch in `shared/model-migration.md` — note this carve-out does not apply to Opus 4.7 or 4.8). For older models, `budget_tokens` must be less than `max_tokens` (minimum 1024). This will throw an error if you get it wrong.
|
||||
- **4.6/4.7/4.8 family prefill removed:** Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6. Use structured outputs (`output_config.format`) or system prompt instructions to control response format instead.
|
||||
- **Fable 5 / Opus 4.8 / 4.7 thinking:** Adaptive only. `thinking: {type: "enabled", budget_tokens: N}` returns 400 — `budget_tokens` is fully removed (along with `temperature`, `top_p`, `top_k`). Use `thinking: {type: "adaptive"}`. Opus 4.8 inherits this surface from 4.7 with no new breaking changes; Fable 5 adds one — an explicit `thinking: {type: "disabled"}` returns a 400 (accepted on 4.7/4.8); omit the param instead.
|
||||
- **Opus 4.6 / Sonnet 4.6 thinking:** Use `thinking: {type: "adaptive"}` — do NOT use `budget_tokens` for new 4.6 code (deprecated on both Opus 4.6 and Sonnet 4.6; for gradual migration of existing code, see the transitional escape hatch in `shared/model-migration.md` — note this carve-out does not apply to Fable 5, Opus 4.7 or 4.8). For older models, `budget_tokens` must be less than `max_tokens` (minimum 1024). This will throw an error if you get it wrong.
|
||||
- **Prefill removed (Fable 5 and the 4.6/4.7/4.8 family):** Assistant message prefills (last-assistant-turn prefills) return a 400 error on Fable 5, Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6. Use structured outputs (`output_config.format`) or system prompt instructions to control response format instead. (One exception: the fallback-credit prefill claim — when redeeming a credit with `fallback_has_prefill_claim: true`, the server accepts the echoed assistant message; see the migration guide's refusal section.)
|
||||
- **Fable 5 `refusal` stop reason:** Safety classifiers may decline a request — a successful HTTP 200 with `stop_reason: "refusal"` (pre-output: empty `content`, nothing billed; mid-stream: partial output billed — discard it). Check `stop_reason` before reading `response.content[0]`, or you'll hit index errors on refused requests. To retry on another model, replaying history as-is works — other models silently ignore the refused model's thinking blocks — but ignored blocks still bill input tokens, so strip them when switching for good (exception: a fallback-credit redemption must echo the refused body exactly, thinking blocks included).
|
||||
- **Fable 5 tokenizer:** ~30% more tokens for the same content vs Opus-tier models. Token counts, context-window budgets, and `max_tokens` values measured on other models don't transfer — re-measure with `count_tokens` passing `model: "claude-fable-5"` (the response includes counts under both tokenizers).
|
||||
- **Confirm migration scope before editing:** When a user asks to migrate code to a newer Claude model without naming a specific file, directory, or file list, **ask which scope to apply first** — the entire working directory, a specific subdirectory, or a specific set of files. Do not start editing until the user confirms. Imperative phrasings like "migrate my codebase", "move my project to X", "upgrade to Sonnet 4.6", or bare "migrate to Opus 4.8" are **still ambiguous** — they tell you what to do but not where, so ask. Proceed without asking only when the prompt names an exact file, a specific directory, or an explicit file list ("migrate `app.py`", "migrate everything under `services/`", "update `a.py` and `b.py`"). See `shared/model-migration.md` Step 0.
|
||||
- **`max_tokens` defaults:** Don't lowball `max_tokens` — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to `~16000` (keeps responses under SDK HTTP timeouts). For streaming requests, default to `~64000` (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (`~256`), cost caps, deliberately short outputs, or **`max_tokens: 0`** for cache pre-warming (see `shared/prompt-caching.md` → Pre-warming).
|
||||
- **128K output tokens:** Opus 4.6, Opus 4.7, and Opus 4.8 support up to 128K `max_tokens`, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use `.stream()` with `.get_final_message()` / `.finalMessage()`.
|
||||
- **Tool call JSON parsing (4.6/4.7/4.8 family):** Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6 may produce different JSON string escaping in tool call `input` fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with `json.loads()` / `JSON.parse()` — never do raw string matching on the serialized input.
|
||||
- **128K output tokens:** Fable 5, Opus 4.6, Opus 4.7, and Opus 4.8 support up to 128K `max_tokens`, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use `.stream()` with `.get_final_message()` / `.finalMessage()`.
|
||||
- **Tool call JSON parsing (Fable 5 and the 4.6/4.7/4.8 family):** Fable 5, Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6 may produce different JSON string escaping in tool call `input` fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with `json.loads()` / `JSON.parse()` — never do raw string matching on the serialized input.
|
||||
- **Structured outputs (all models):** Use `output_config: {format: {...}}` instead of the deprecated `output_format` parameter on `messages.create()`. This is a general API change, not 4.6-specific.
|
||||
- **Don't reimplement SDK functionality:** The SDK provides high-level helpers — use them instead of building from scratch. Specifically: use `stream.finalMessage()` instead of wrapping `.on()` events in `new Promise()`; use typed exception classes (`Anthropic.RateLimitError`, etc.) instead of string-matching error messages; use SDK types (`Anthropic.MessageParam`, `Anthropic.Tool`, `Anthropic.Message`, etc.) instead of redefining equivalent interfaces.
|
||||
- **Don't define custom types for SDK data structures:** The SDK exports types for all API objects. Use `Anthropic.MessageParam` for messages, `Anthropic.Tool` for tool definitions, `Anthropic.ToolUseBlock` / `Anthropic.ToolResultBlockParam` for tool results, `Anthropic.Message` for responses. Defining your own `interface ChatMessage { role: string; content: unknown }` duplicates what the SDK already provides and loses type safety.
|
||||
|
||||
@@ -182,11 +182,11 @@ For 1-hour TTL: `"cache_control": {"type": "ephemeral", "ttl": "1h"}`. Top-level
|
||||
|
||||
## Extended Thinking
|
||||
|
||||
> **Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.8 and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
|
||||
> **Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Fable 5, Opus 4.8, and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
|
||||
> **Older models:** Use `"type": "enabled"` with `"budget_tokens": N` (must be < `max_tokens`, min 1024).
|
||||
|
||||
```bash
|
||||
# Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
|
||||
# Fable 5 / Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
|
||||
curl https://api.anthropic.com/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-api-key: $ANTHROPIC_API_KEY" \
|
||||
|
||||
@@ -29,7 +29,7 @@ client := anthropic.NewClient(
|
||||
|
||||
## Model Constants
|
||||
|
||||
The Go SDK provides typed model constants: `anthropic.ModelClaudeOpus4_8`, `anthropic.ModelClaudeOpus4_7`, `anthropic.ModelClaudeSonnet4_6`, `anthropic.ModelClaudeHaiku4_5_20251001`. Use `ModelClaudeOpus4_8` unless the user specifies otherwise.
|
||||
The Go SDK provides typed model constants: `anthropic.ModelClaudeFable5`, `anthropic.ModelClaudeOpus4_8`, `anthropic.ModelClaudeOpus4_7`, `anthropic.ModelClaudeSonnet4_6`, `anthropic.ModelClaudeHaiku4_5_20251001`. Use `ModelClaudeOpus4_8` unless the user specifies otherwise; if they ask for Fable or the most powerful model, use `anthropic.ModelClaudeFable5` (see `shared/models.md` for the full resolution table).
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -250,11 +250,11 @@ If `cache_read_input_tokens` is zero across repeated identical-prefix requests,
|
||||
|
||||
## Extended Thinking
|
||||
|
||||
> **Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.8 and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
|
||||
> **Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Fable 5, Opus 4.8, and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
|
||||
> **Older models:** Use `thinking: {type: "enabled", budget_tokens: N}` (must be < `max_tokens`, min 1024).
|
||||
|
||||
```python
|
||||
# Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
|
||||
# Fable 5 / Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
|
||||
response = client.messages.create(
|
||||
model="claude-opus-4-8",
|
||||
max_tokens=16000,
|
||||
@@ -381,7 +381,7 @@ response2 = conversation.send("What's my name?") # Claude remembers "Alice"
|
||||
|
||||
### Compaction (long conversations)
|
||||
|
||||
> **Beta, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.
|
||||
> **Beta, Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
@@ -46,7 +46,7 @@ No final-message accumulation is done for you in this form.
|
||||
|
||||
Claude may return text, thinking blocks, or tool use. Handle each appropriately:
|
||||
|
||||
> **Opus 4.8 / Opus 4.7 / Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.
|
||||
> **Fable 5 / Opus 4.8 / Opus 4.7 / Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.
|
||||
|
||||
```python
|
||||
with client.messages.stream(
|
||||
|
||||
@@ -107,10 +107,12 @@ Some 400 errors are specifically related to parameter validation:
|
||||
- `budget_tokens` >= `max_tokens` in extended thinking
|
||||
- Invalid tool definition schema
|
||||
|
||||
**Model-specific 400s on Opus 4.8 / 4.7:**
|
||||
**Model-specific 400s on Fable 5 / Opus 4.8 / 4.7:**
|
||||
|
||||
- `temperature`, `top_p`, `top_k` are removed — sending any of them returns 400. Delete the parameter; see `shared/model-migration.md` → Per-SDK Syntax Reference.
|
||||
- `thinking: {type: "enabled", budget_tokens: N}` is removed — sending it returns 400. Use `thinking: {type: "adaptive"}` instead.
|
||||
- **Fable 5 only:** an explicit `thinking: {type: "disabled"}` returns 400 (it is accepted on Opus 4.8/4.7). Omit the `thinking` param entirely instead.
|
||||
- **Fable 5 only:** if the organization is set to zero data retention (ZDR) — or any retention below the required 30 days — then **all** Fable 5 requests return `400 invalid_request_error`, even with a perfectly valid payload. Check the org's retention configuration before debugging the request body.
|
||||
|
||||
**Common mistake with extended thinking on older models (Opus 4.6 and earlier):**
|
||||
|
||||
@@ -168,8 +170,10 @@ thinking: budget_tokens=10000, max_tokens=16000
|
||||
|
||||
| Mistake | Error | Fix |
|
||||
| ------------------------------- | ---------------- | ------------------------------------------------------- |
|
||||
| `temperature`/`top_p`/`top_k` on Opus 4.8 / 4.7 | 400 | Remove the parameter (see `shared/model-migration.md`) |
|
||||
| `budget_tokens` on Opus 4.8 / 4.7 | 400 | Use `thinking: {type: "adaptive"}` |
|
||||
| `temperature`/`top_p`/`top_k` on Fable 5 / Opus 4.8 / 4.7 | 400 | Remove the parameter (see `shared/model-migration.md`) |
|
||||
| `budget_tokens` on Fable 5 / Opus 4.8 / 4.7 | 400 | Use `thinking: {type: "adaptive"}` |
|
||||
| `thinking: {type: "disabled"}` on Fable 5 | 400 | Omit the `thinking` param entirely (accepted on Opus 4.8/4.7) |
|
||||
| Org set to ZDR / retention below 30 days (Fable 5) | 400 on every request | Fix the org's data-retention configuration — the payload isn't the problem |
|
||||
| `budget_tokens` >= `max_tokens` (older models) | 400 | Ensure `budget_tokens` < `max_tokens` |
|
||||
| Typo in model ID | 404 | Use valid model ID like `claude-opus-4-8` |
|
||||
| First message is `assistant` | 400 | First message must be `user` |
|
||||
|
||||
@@ -17,6 +17,7 @@ This file contains WebFetch URLs for fetching current information from platform.
|
||||
| --------------- | ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
|
||||
| Models Overview | `https://platform.claude.com/docs/en/about-claude/models/overview.md` | "Extract current model IDs, context windows, and pricing for all Claude models" |
|
||||
| Migration Guide | `https://platform.claude.com/docs/en/about-claude/models/migration-guide.md` | "Extract breaking changes, deprecated parameters, and per-model migration steps when moving to a newer Claude model" |
|
||||
| Introducing Claude Fable 5 | `https://platform.claude.com/docs/en/about-claude/models/introducing-claude-fable-5.md` | "Extract capabilities, API changes, and availability stages for Claude Fable 5 and Claude Mythos 5" |
|
||||
| Pricing | `https://platform.claude.com/docs/en/pricing.md` | "Extract current pricing per million tokens for input and output" |
|
||||
|
||||
### Core Features
|
||||
@@ -129,6 +130,8 @@ WebFetch these when a binding (class, method, namespace, field) isn't covered in
|
||||
| C# | `https://github.com/anthropics/anthropic-sdk-csharp` | "Extract beta managed-agents classes and method signatures (NuGet package, `BetaManagedAgents*` types)" |
|
||||
| PHP | `https://github.com/anthropics/anthropic-sdk-php` | "Extract beta managed-agents classes and method signatures (`$client->beta->agents`, `BetaManagedAgents*` params)" |
|
||||
|
||||
Each SDK repo also ships runnable programs under `examples/` — including the refusal-fallback / `fallbacks` examples (client-side middleware registration, fallback state, server-side `fallbacks` param). Fetch those for exact per-language syntax instead of translating another language's example.
|
||||
|
||||
---
|
||||
|
||||
## Fallback Strategy
|
||||
|
||||
@@ -8,7 +8,7 @@ All endpoints require `x-api-key` and `anthropic-version: 2023-06-01` headers. M
|
||||
anthropic-beta: managed-agents-2026-04-01
|
||||
```
|
||||
|
||||
The SDK adds this header automatically for all `client.beta.{agents,environments,sessions,vaults,memory_stores}.*` calls. Skills endpoints use `skills-2025-10-02`; Files endpoints use `files-api-2025-04-14`.
|
||||
The SDK adds this header automatically for all `client.beta.{agents,environments,sessions,vaults,memory_stores,deployments,deployment_runs}.*` calls. Skills endpoints use `skills-2025-10-02`; Files endpoints use `files-api-2025-04-14`.
|
||||
|
||||
---
|
||||
|
||||
@@ -26,6 +26,8 @@ All resources are under the `beta` namespace. Python and TypeScript share identi
|
||||
| Session Events | `sessions.events.list` / `send` / `stream` | `Sessions.Events.List` / `Send` / `StreamEvents` |
|
||||
| Session Threads | `sessions.threads.list` / `retrieve` / `archive`; `sessions.threads.events.list` / `stream` | `Sessions.Threads.List` / `Get` / `Archive`; `Sessions.Threads.Events.List` / `StreamEvents` |
|
||||
| Session Resources | `sessions.resources.add` / `retrieve` / `update` / `list` / `delete` | `Sessions.Resources.Add` / `Get` / `Update` / `List` / `Delete` |
|
||||
| Deployments | `deployments.create` / `pause` / `unpause` / `archive` / `run` | Not yet documented — WebFetch the SDK repo (`shared/live-sources.md`) |
|
||||
| Deployment Runs | `deployment_runs.list` (TS: `deploymentRuns.list`) | Not yet documented — WebFetch the SDK repo (`shared/live-sources.md`) |
|
||||
| Vaults | `vaults.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Vaults.New` / `Get` / `Update` / `List` / `Delete` / `Archive` |
|
||||
| Credentials | `vaults.credentials.create` / `retrieve` / `update` / `list` / `delete` / `archive` / `mcp_oauth_validate` | `Vaults.Credentials.New` / `Get` / `Update` / `List` / `Delete` / `Archive` / `McpOauthValidate` |
|
||||
| Memory Stores | `memory_stores.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `MemoryStores.New` / `Get` / `Update` / `List` / `Delete` / `Archive` |
|
||||
@@ -113,9 +115,29 @@ Per-subagent event streams in multiagent sessions. See `shared/managed-agents-mu
|
||||
|
||||
For `type: "self_hosted"`, `config` is the bare `{"type": "self_hosted"}` — `networking` and `packages` do not apply.
|
||||
|
||||
## Deployments
|
||||
|
||||
Scheduled deployments (`depl_` IDs) run an agent on a recurring cron schedule — each firing creates a session. See `shared/managed-agents-scheduled-deployments.md` for the conceptual guide (cron/DST semantics, failure behavior, lifecycle).
|
||||
|
||||
| Method | Path | Operation | Description |
|
||||
| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |
|
||||
| `POST` | `/v1/deployments` | CreateDeployment | Create a scheduled deployment |
|
||||
| `POST` | `/v1/deployments/{deployment_id}/pause` | PauseDeployment | Suppress scheduled triggers (reversible; manual runs still allowed) |
|
||||
| `POST` | `/v1/deployments/{deployment_id}/unpause` | UnpauseDeployment | Resume from the next occurrence (no backfill) |
|
||||
| `POST` | `/v1/deployments/{deployment_id}/archive` | ArchiveDeployment | **Terminal** — schedule stops, deployment becomes immutable |
|
||||
| `POST` | `/v1/deployments/{deployment_id}/run` | RunDeployment | Trigger a manual run immediately (`trigger_context.type: "manual"`); works while paused |
|
||||
|
||||
## Deployment Runs
|
||||
|
||||
Each trigger attempt (scheduled or manual) writes a `deployment_run` record (`drun_` IDs) carrying either the created `session_id` or an `error.type` (`environment_archived`, `agent_archived`, `vault_not_found`, `session_rate_limited`, `service_unavailable`).
|
||||
|
||||
| Method | Path | Operation | Description |
|
||||
| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |
|
||||
| `GET` | `/v1/deployment_runs?deployment_id=...` | ListDeploymentRuns | List runs for a deployment (paginated; filter failures with `has_error=true`) |
|
||||
|
||||
## Vaults
|
||||
|
||||
Vaults store MCP credentials that Anthropic manages on your behalf — OAuth credentials with auto-refresh, or static bearer tokens. Attach to sessions via `vault_ids`. See `managed-agents-tools.md` §Vaults for the conceptual guide and credential shapes.
|
||||
Vaults store credentials that Anthropic manages on your behalf — MCP credentials (OAuth with auto-refresh, or static bearer tokens) and `environment_variable` credentials substituted into outbound requests at egress. Attach to sessions via `vault_ids`. See `managed-agents-tools.md` §Vaults for the conceptual guide and credential shapes.
|
||||
|
||||
| Method | Path | Operation | Description |
|
||||
| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |
|
||||
@@ -258,7 +280,7 @@ Immutable per-mutation snapshots (`memver_...`) — the audit and rollback surfa
|
||||
"checkout": { "type": "branch", "name": "main" }
|
||||
}
|
||||
],
|
||||
"vault_ids": ["vlt_abc123 (optional — MCP credentials with auto-refresh)"],
|
||||
"vault_ids": ["vlt_abc123 (optional — vault credentials: MCP auth + environment variables)"],
|
||||
"metadata": {
|
||||
"key": "value"
|
||||
}
|
||||
@@ -286,6 +308,26 @@ Immutable per-mutation snapshots (`memver_...`) — the audit and rollback surfa
|
||||
}
|
||||
```
|
||||
|
||||
### CreateDeployment Request Body
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Weekly compliance scan",
|
||||
"agent": "agent_abc123 (required — same shapes as CreateSession)",
|
||||
"environment_id": "env_abc123 (required)",
|
||||
"initial_events": [
|
||||
{ "type": "user.message", "content": [{ "type": "text", "text": "Run the weekly compliance scan." }] }
|
||||
],
|
||||
"schedule": {
|
||||
"type": "cron",
|
||||
"expression": "0 20 * * 5",
|
||||
"timezone": "America/New_York"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
> Optional session config (`resources`, `vault_ids`, etc.) is supported the same way as on CreateSession. Response includes `status`, `paused_reason`, and `schedule.upcoming_runs_at` (next fire times). See `shared/managed-agents-scheduled-deployments.md`.
|
||||
|
||||
### SendEvents Request Body
|
||||
|
||||
```json
|
||||
@@ -304,6 +346,8 @@ Immutable per-mutation snapshots (`memver_...`) — the audit and rollback surfa
|
||||
}
|
||||
```
|
||||
|
||||
> `system.message` events (update the system prompt between turns) use the same envelope with `type: "system.message"` — Claude Opus 4.8 only; see `shared/managed-agents-events.md` § Updating the system prompt mid-session.
|
||||
|
||||
### Define Outcome Event
|
||||
|
||||
```json
|
||||
|
||||
@@ -183,7 +183,9 @@ Delete the original via `files.delete(uploaded.id)`; the session-scoped copy is
|
||||
|
||||
## 9. Secrets for non-MCP APIs and CLIs — keep them host-side via custom tools
|
||||
|
||||
**Problem:** you want the agent to call a third-party API or run a CLI that needs a secret (API key, token, service-account credential), but there is currently no way to set environment variables inside the session container, and vaults currently hold MCP credentials only — they are not exposed to the container's shell. So `curl`, installed CLIs, or SDK clients running via the `bash` tool have no first-class place to read a secret from.
|
||||
**Problem:** you want the agent to call a third-party API or run a CLI that needs a secret (API key, token, service-account credential), but you can't or don't want to hand the secret to a vault.
|
||||
|
||||
**First check:** for cloud environments, the first-class answer is now a vault `environment_variable` credential — the agent's shell sees an opaque placeholder and the real secret is substituted at egress. See `shared/managed-agents-tools.md` → Vaults. Use this pattern instead when that doesn't fit: **self-hosted sandboxes** (env-var credentials not yet supported there), clients that reject the placeholder via local format validation, secrets that must never leave your infrastructure, or calls that need host-side binaries.
|
||||
|
||||
**Solution:** move the authenticated call to your side. Declare a custom tool on the agent; when the agent emits `agent.custom_tool_use`, your orchestrator (the process reading the SSE stream) executes the call with its own credentials and responds with `user.custom_tool_result`. The container never sees the key.
|
||||
|
||||
|
||||
@@ -132,7 +132,7 @@ const session = await client.beta.sessions.create(
|
||||
| `environment_id`| string | **Yes** | Environment ID |
|
||||
| `title` | string | No | Human-readable name (appears in logs/dashboards) |
|
||||
| `resources` | array | No | Files, GitHub repos, or memory stores, attached to the container at startup. Memory stores are session-create-only (not addable via `resources.add()`). |
|
||||
| `vault_ids` | array | No | Vault IDs (`vlt_*`) — MCP credentials with auto-refresh. See `shared/managed-agents-tools.md` → Vaults. |
|
||||
| `vault_ids` | array | No | Vault IDs (`vlt_*`) — MCP credentials with auto-refresh + `environment_variable` secrets substituted at egress. See `shared/managed-agents-tools.md` → Vaults. |
|
||||
| `metadata` | object | No | User-provided key-value pairs |
|
||||
|
||||
**Agent configuration fields** (passed to `agents.create()`, not `sessions.create()`):
|
||||
|
||||
@@ -13,6 +13,31 @@ Send events to a session via `POST /v1/sessions/{id}/events`.
|
||||
| `user.tool_confirmation` | Approve/deny a tool call (when `always_ask` policy) |
|
||||
| `user.custom_tool_result` | Provide result for a custom tool call |
|
||||
| `user.define_outcome` | Start a rubric-graded iterate loop — see `shared/managed-agents-outcomes.md` |
|
||||
| `system.message` | Update the agent's system prompt between turns — **Claude Opus 4.8 only**; see § Updating the system prompt mid-session |
|
||||
|
||||
#### Updating the system prompt mid-session (`system.message`)
|
||||
|
||||
Unlike the `system` field on the agent definition (fixed at session creation), a `system.message` event changes the system prompt **as the session progresses** — a different persona, revised constraints, or runtime-fetched context that should shape behavior going forward:
|
||||
|
||||
```python
|
||||
client.beta.sessions.events.send(
|
||||
session.id,
|
||||
events=[
|
||||
{
|
||||
"type": "system.message",
|
||||
"content": [
|
||||
{"type": "text", "text": "The user's current timezone is America/New_York."},
|
||||
],
|
||||
},
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
Constraints:
|
||||
|
||||
- **Claude Opus 4.8 only.** If any model configured on the agent does not support mid-conversation system injection, the event is rejected with a `model_does_not_support_mid_conversation_system` validation error.
|
||||
- **Cannot be sent while the session is idle with `stop_reason: requires_action`** (blocked on `user.custom_tool_result` / `user.tool_confirmation`).
|
||||
- `content` accepts 1–1000 text items.
|
||||
|
||||
### Receiving Events
|
||||
|
||||
|
||||
@@ -83,10 +83,10 @@ Emit as `resources: [{type: "file", file_id, mount_path}]`. Max 999 file resourc
|
||||
|
||||
Per-run. Points at the agent + environment, attaches credentials, kicks off.
|
||||
|
||||
**Vault credentials** (if the agent declared MCP servers):
|
||||
**Vault credentials** (if the agent declared MCP servers, or the job needs an API key for a CLI/SDK/direct API call):
|
||||
- [ ] Existing vault, or create one? (`client.beta.vaults.create()` + `vaults.credentials.create()`)
|
||||
|
||||
Credentials are write-only, matched to MCP servers by URL, auto-refreshed. See `shared/managed-agents-tools.md` → Vaults.
|
||||
Credentials are write-only. MCP credentials are matched to MCP servers by URL and auto-refreshed; `environment_variable` credentials are substituted into outbound requests at egress (the sandbox sees only a placeholder). See `shared/managed-agents-tools.md` → Vaults.
|
||||
|
||||
**Kickoff — pick one:**
|
||||
- [ ] **Conversational:** a first `user.message` to the agent.
|
||||
|
||||
@@ -25,11 +25,11 @@ Managed Agents is in beta. The SDK sets required beta headers automatically:
|
||||
|
||||
| Beta Header | What it enables |
|
||||
| ------------------------------ | ---------------------------------------------------- |
|
||||
| `managed-agents-2026-04-01` | Agents, Environments, Sessions, Events, Session Resources, Session Threads, Outcomes, Multiagent, Vaults, Credentials, Memory Stores |
|
||||
| `managed-agents-2026-04-01` | Agents, Environments, Sessions, Events, Session Resources, Session Threads, Outcomes, Multiagent, Vaults, Credentials, Memory Stores, Deployments |
|
||||
| `skills-2025-10-02` | Skills API (for managing custom skill definitions) |
|
||||
| `files-api-2025-04-14` | Files API for file uploads |
|
||||
|
||||
**Which beta header goes where:** The SDK sets `managed-agents-2026-04-01` automatically on `client.beta.{agents,environments,sessions,vaults,memory_stores}.*` calls, and `files-api-2025-04-14` / `skills-2025-10-02` automatically on `client.beta.files.*` / `client.beta.skills.*` calls. You do NOT need to add the Skills or Files beta header when calling Managed Agents endpoints. **Exception — session-scoped file listing:** `client.beta.files.list({scope_id: session.id})` is a Files endpoint that takes a Managed Agents parameter, so it needs **both** headers. Pass `betas: ["managed-agents-2026-04-01"]` explicitly on that call (the SDK adds the Files header; you add the Managed Agents one). See `shared/managed-agents-environments.md` → Session outputs.
|
||||
**Which beta header goes where:** The SDK sets `managed-agents-2026-04-01` automatically on `client.beta.{agents,environments,sessions,vaults,memory_stores,deployments,deployment_runs}.*` calls, and `files-api-2025-04-14` / `skills-2025-10-02` automatically on `client.beta.files.*` / `client.beta.skills.*` calls. You do NOT need to add the Skills or Files beta header when calling Managed Agents endpoints. **Exception — session-scoped file listing:** `client.beta.files.list({scope_id: session.id})` is a Files endpoint that takes a Managed Agents parameter, so it needs **both** headers. Pass `betas: ["managed-agents-2026-04-01"]` explicitly on that call (the SDK adds the Files header; you add the Managed Agents one). See `shared/managed-agents-environments.md` → Session outputs.
|
||||
|
||||
|
||||
## Reading Guide
|
||||
@@ -53,14 +53,15 @@ Managed Agents is in beta. The SDK sets required beta headers automatically:
|
||||
| Upload files / attach repos | `shared/managed-agents-environments.md` (Resources) |
|
||||
| Give agents persistent memory across sessions | `shared/managed-agents-memory.md` — memory stores, `memory_store` session resource, preconditions, versions/redact |
|
||||
| Define agents/environments as version-controlled YAML; drive the API from the shell | `shared/anthropic-cli.md` — `ant beta:agents create < agent.yaml`, `--transform`, `@file` inlining |
|
||||
| Store MCP credentials | `shared/managed-agents-tools.md` (Vaults section) |
|
||||
| Call a non-MCP API / CLI that needs a secret | `shared/managed-agents-client-patterns.md` Pattern 9 — no container env vars; vaults are MCP-only; keep the secret host-side via a custom tool |
|
||||
| Store credentials (MCP auth, API keys for CLIs/SDKs) | `shared/managed-agents-tools.md` (Vaults section) — `mcp_oauth` / `static_bearer` / `environment_variable` |
|
||||
| Call a non-MCP API / CLI that needs a secret | `shared/managed-agents-tools.md` (Vaults section) — `environment_variable` credential, substituted at egress. If that doesn't fit (e.g. self-hosted sandboxes), `shared/managed-agents-client-patterns.md` Pattern 9 keeps the secret host-side via a custom tool |
|
||||
| Run an agent on a recurring cron schedule | `shared/managed-agents-scheduled-deployments.md` — deployments, deployment runs, pause/auto-pause |
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Agent FIRST, then session — NO EXCEPTIONS** — the session's `agent` field accepts **only** a string ID or `{type: "agent", id, version}`. `model`, `system`, `tools`, `mcp_servers`, `skills` are **top-level fields on `POST /v1/agents`**, never on `sessions.create()`. If the user hasn't created an agent, that is step zero of every example.
|
||||
- **Agent ONCE, not every run** — `agents.create()` is a setup step. Store the returned `agent_id` and reuse it; don't call `agents.create()` at the top of your hot path. If the agent's config needs to change, `POST /v1/agents/{id}` — each update creates a new version, and sessions can pin to a specific version for reproducibility.
|
||||
- **MCP auth goes through vaults** — the agent's `mcp_servers` array declares `{type, name, url}` only (no auth). Credentials live in vaults (`client.beta.vaults.credentials.create`) and attach to sessions via `vault_ids`. Anthropic auto-refreshes OAuth tokens using the stored refresh token.
|
||||
- **MCP auth goes through vaults** — the agent's `mcp_servers` array declares `{type, name, url}` only (no auth). Credentials live in vaults (`client.beta.vaults.credentials.create`) and attach to sessions via `vault_ids`. Anthropic auto-refreshes OAuth tokens using the stored refresh token. Vaults also hold `environment_variable` credentials for non-MCP services (CLIs, SDKs, direct API calls) — substituted at egress, never visible in the sandbox.
|
||||
- **Reconcile resources before the first run** — a session with a clear ask but a missing tool, credential, data mount, or context will discover the gap mid-run, then flail and give up. Before creating the session, check that every action in the task maps to a configured tool/MCP server, every MCP server has a vault credential, and every referenced file/host is mounted/reachable. When helping a user set one up, run the reconciliation in `shared/managed-agents-onboarding.md` → §3 Pre-flight viability check.
|
||||
- **Stream to get events** — `GET /v1/sessions/{id}/events/stream` is the primary way to receive agent output in real-time.
|
||||
- **SSE stream has no replay — reconnect with consolidation** — if the stream drops while a `agent.tool_use`, `agent.mcp_tool_use`, or `agent.custom_tool_use` is pending resolution (`user.tool_confirmation` for the first two, `user.custom_tool_result` for the last one), the session deadlocks (client disconnects → session idles → reconnect happens → no client resolution happens). On every (re)connect: open stream with `GET /v1/sessions/{id}/events/stream` , fetch `GET /v1/sessions/{id}/events`, dedupe by event ID, then proceed. See `shared/managed-agents-events.md` → Reconnecting after a dropped stream.
|
||||
|
||||
144
skills/claude-api/shared/managed-agents-scheduled-deployments.md
Normal file
144
skills/claude-api/shared/managed-agents-scheduled-deployments.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# Managed Agents — Scheduled Deployments
|
||||
|
||||
A **scheduled deployment** runs an agent on a recurring cron schedule — each firing creates a session autonomously. Use it for predictable-cadence work: nightly triage, weekly compliance scans, hourly monitors.
|
||||
|
||||
Requires the `managed-agents-2026-04-01` beta header (the SDK sets it automatically for `client.beta.deployments.*` / `client.beta.deployment_runs.*` calls).
|
||||
|
||||
## Create a deployment
|
||||
|
||||
A deployment bundles everything a session needs (agent, environment, optional files / GitHub / memory stores / vaults) plus a `schedule` and the `initial_events` that kick off each run:
|
||||
|
||||
- `agent` and `environment_id` are required — same shapes as `sessions.create` (see `shared/managed-agents-core.md`).
|
||||
- `initial_events` must contain the starting `user.message`.
|
||||
- `schedule` takes a cron `expression` and an IANA `timezone`. Minute-level granularity is the maximum.
|
||||
|
||||
```bash
|
||||
curl -fsSL https://api.anthropic.com/v1/deployments \
|
||||
-H "x-api-key: $ANTHROPIC_API_KEY" \
|
||||
-H "anthropic-version: 2023-06-01" \
|
||||
-H "anthropic-beta: managed-agents-2026-04-01" \
|
||||
-H "content-type: application/json" \
|
||||
-d @- <<EOF
|
||||
{
|
||||
"name": "Weekly compliance scan",
|
||||
"agent": "$AGENT_ID",
|
||||
"environment_id": "$ENVIRONMENT_ID",
|
||||
"initial_events": [
|
||||
{"type": "user.message", "content": [{"type": "text", "text": "Run the weekly compliance scan."}]}
|
||||
],
|
||||
"schedule": {
|
||||
"type": "cron",
|
||||
"expression": "0 20 * * 5",
|
||||
"timezone": "America/New_York"
|
||||
}
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
```python
|
||||
deployment = client.beta.deployments.create(
|
||||
name="Weekly compliance scan",
|
||||
agent=agent.id,
|
||||
environment_id=environment.id,
|
||||
initial_events=[
|
||||
{
|
||||
"type": "user.message",
|
||||
"content": [{"type": "text", "text": "Run the weekly compliance scan."}],
|
||||
},
|
||||
],
|
||||
schedule={
|
||||
"type": "cron",
|
||||
"expression": "0 20 * * 5",
|
||||
"timezone": "America/New_York",
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
The response is a deployment object (`depl_` ID prefix). Check `schedule.upcoming_runs_at` — the next fire times — to confirm the schedule parses the way you intended:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "depl_01xyz",
|
||||
"status": "active",
|
||||
"paused_reason": null,
|
||||
"schedule": {
|
||||
"type": "cron",
|
||||
"expression": "0 20 * * 5",
|
||||
"timezone": "America/New_York",
|
||||
"last_run_at": null,
|
||||
"upcoming_runs_at": ["2026-05-09T00:00:00Z", "2026-05-16T00:00:00Z", "2026-05-23T00:00:00Z"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Deployments may apply up to **10 seconds of jitter** to distribute load. Maximum **1000 scheduled deployments per organization** (contact Anthropic support for more).
|
||||
|
||||
### Cron and timezone semantics
|
||||
|
||||
- **Expression:** standard POSIX cron (`minute hour day-of-month month day-of-week`).
|
||||
- **Timezone:** IANA identifier (e.g. `"America/Los_Angeles"`).
|
||||
- **DST:** literal wall-clock matching — `"0 20 * * *"` in `America/New_York` fires at 8:00 PM local regardless of EST/EDT.
|
||||
|
||||
> ⚠️ **DST edge:** wall-clock times that don't exist on a spring-forward day (e.g. 2AM) are **skipped**; times that occur twice on a fall-back day **fire twice**. Schedule outside the 1–3AM local window, or use UTC, when missed or duplicate executions are unacceptable.
|
||||
|
||||
## Deployment runs
|
||||
|
||||
Every trigger attempt — successful or not — writes a **deployment run** record (`drun_` prefix), so you can audit failures independent of the session lifecycle. A successful run carries the created `session_id`; follow that session via the event stream (`shared/managed-agents-events.md`) or webhooks (`shared/managed-agents-webhooks.md`) as usual. A failed run carries an `error` whose `type` explains why session creation was rejected.
|
||||
|
||||
```python
|
||||
# All runs for a deployment
|
||||
for run in client.beta.deployment_runs.list(deployment_id=deployment.id):
|
||||
print(run.created_at, run.session_id or run.error.type)
|
||||
|
||||
# Failures only
|
||||
for run in client.beta.deployment_runs.list(deployment_id=deployment.id, has_error=True):
|
||||
print(run.created_at, run.error.type, run.error.message)
|
||||
```
|
||||
|
||||
```typescript
|
||||
for await (const run of client.beta.deploymentRuns.list({
|
||||
deployment_id: deployment.id,
|
||||
has_error: true,
|
||||
})) {
|
||||
console.log(run.created_at, run.error?.type, run.error?.message);
|
||||
}
|
||||
```
|
||||
|
||||
Raw HTTP: `GET /v1/deployment_runs?deployment_id=...&has_error=true`.
|
||||
|
||||
A failed run looks like:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "deployment_run",
|
||||
"id": "drun_01abc124",
|
||||
"deployment_id": "depl_01xyz",
|
||||
"trigger_context": { "type": "schedule", "scheduled_at": "2026-05-09T00:00:00Z" },
|
||||
"session_id": null,
|
||||
"error": { "type": "environment_archived", "message": "environment `env_01abc` is archived" },
|
||||
"agent": { "type": "agent", "id": "agent_01ghi789", "version": 3 },
|
||||
"created_at": "2026-05-09T00:00:01Z"
|
||||
}
|
||||
```
|
||||
|
||||
Error types include `environment_archived`, `agent_archived`, `vault_not_found`, `session_rate_limited`, and `service_unavailable`.
|
||||
|
||||
## Lifecycle: pause / unpause / archive
|
||||
|
||||
| Operation | SDK | Effect |
|
||||
|---|---|---|
|
||||
| Pause | `client.beta.deployments.pause(id)` | Suppresses scheduled triggers go-forward. Sessions already running continue. **Manual runs are still permitted while paused.** Sets `paused_reason: {"type": "manual"}`. |
|
||||
| Unpause | `client.beta.deployments.unpause(id)` | Resumes from the next scheduled occurrence. **Missed triggers are not backfilled.** Clears `paused_reason`. |
|
||||
| Archive | `client.beta.deployments.archive(id)` | **Terminal** — the schedule stops and the deployment can no longer be modified. Use pause for anything reversible. |
|
||||
|
||||
Raw HTTP: `POST /v1/deployments/{deployment_id}/pause` (likewise `/unpause`, `/archive`).
|
||||
|
||||
### Failure behavior
|
||||
|
||||
- **Rate-limited:** recorded immediately as a `session_rate_limited` run, **no retry** — the schedule simply tries again at the next occurrence. (Rate limits on API calls *inside* a session are handled by the session itself.)
|
||||
- **Other failed runs** (e.g. `environment_archived`, `vault_not_found`, `service_unavailable`): the run records the `error.type` — monitor runs and fix the referenced resource, or pause the deployment.
|
||||
- **Agent archived or deleted:** the deployment is automatically **archived** (terminal) and no further sessions are created.
|
||||
|
||||
## Manual runs
|
||||
|
||||
`POST /v1/deployments/{deployment_id}/run` (SDK: `client.beta.deployments.run(id)`) creates a session immediately and writes a run with `trigger_context.type: "manual"`. Use it to **test a deployment before committing to the schedule** — and remember it works even while the deployment is paused.
|
||||
@@ -156,6 +156,7 @@ These are **control-plane** calls — authenticate with `x-api-key` (not the env
|
||||
| Container lifecycle, hardening, networking | Anthropic | **You** — run non-root, read-only rootfs, drop caps; egress is whatever your VPC/firewall allows |
|
||||
| `file` / `github_repository` resource mounting | Anthropic mounts into the container | **You** — pass pointers via `sessions.create(metadata={...})` and have your orchestrator fetch/clone before dispatch |
|
||||
| `memory_store` resources | Supported | **Not yet supported** |
|
||||
| Vault `environment_variable` credentials | Supported (substituted at Anthropic-managed egress) | **Not yet supported** — egress is yours, so there's nowhere to substitute the secret. Use MCP credentials or a host-side custom tool (`shared/managed-agents-client-patterns.md` Pattern 9) |
|
||||
| Built-in tools | Via `agent_toolset_20260401` | Supplied by your worker (`EnvironmentWorker` default / `beta_agent_toolset(env)` / `ant` CLI fixed set) |
|
||||
| Skills download | Automatic | `EnvironmentWorker` / `AgentToolContext` fetch into `{workdir}/skills/` (needs `client` + `session_id`) |
|
||||
| Claude Platform on AWS | Supported | **Not available** |
|
||||
|
||||
@@ -190,9 +190,14 @@ This keeps secrets out of reusable agent definitions. Each vault credential is t
|
||||
|
||||
> ⚠️ **MCP auth tokens ≠ REST API tokens.** Hosted MCP servers (`mcp.notion.com`, `mcp.linear.app`, etc.) typically require **OAuth bearer tokens**, not the service's native API keys. A Notion `ntn_` integration token authenticates against Notion's REST API but will **not** work as a vault credential for the Notion MCP server. These are different auth systems.
|
||||
|
||||
### Vaults — the MCP credential store
|
||||
### Vaults — the credential store
|
||||
|
||||
**Vaults** store OAuth credentials (access token + refresh token) that Anthropic auto-refreshes on your behalf via standard OAuth 2.0 `refresh_token` grant. This is the only way to authenticate MCP servers in the launch SDK.
|
||||
**Vaults** store credentials that Anthropic manages on your behalf. Two credential categories:
|
||||
|
||||
- **MCP credentials** (`mcp_oauth`, `static_bearer`) — keyed by `mcp_server_url`. When the agent connects to a server at that URL, the token is injected automatically. `mcp_oauth` tokens are auto-refreshed via the standard OAuth 2.0 `refresh_token` grant. This is the only way to authenticate MCP servers.
|
||||
- **Environment variables** (`environment_variable`) — keyed by `secret_name` (the env var name). The sandbox sees only an **opaque placeholder**; the real secret is substituted into the outbound request **at egress**. Use this for any service that authenticates through an environment variable: CLIs (`aws`, `gcloud`, `stripe`), SDKs, or direct `curl` calls from the `bash` tool.
|
||||
|
||||
Secret fields you supply (`token`, `access_token`, `refresh_token`, `client_secret`, `secret_value`) are write-only — never returned in API responses.
|
||||
|
||||
#### Credentials and the sandbox
|
||||
|
||||
@@ -200,11 +205,9 @@ Vaults store credentials; those credentials **never enter the sandbox**. This is
|
||||
|
||||
- **MCP tool calls** are routed through an Anthropic-side proxy that fetches the credential from the vault and adds it to the outbound request.
|
||||
- **Git operations on attached GitHub repositories** (`git pull`, `git push`, GitHub REST calls) are routed through a git proxy that injects the `github_repository` resource's `authorization_token` the same way.
|
||||
- **Environment-variable credentials** appear in the sandbox as an opaque placeholder; the real value replaces the placeholder at egress, on requests to the credential's allowed hosts only.
|
||||
|
||||
**Not yet supported:** running other authenticated CLIs (e.g. `aws`, `gcloud`, `stripe`) directly inside the sandbox. There is currently no way to set container environment variables or expose vault credentials to arbitrary processes. If you need one of these today:
|
||||
|
||||
- **Prefer an MCP server** for that service if one exists — it gets the same vault-backed injection.
|
||||
- **Otherwise, register a custom tool:** the agent emits `agent.custom_tool_use`, your orchestrator (which already holds the credential) executes the call and returns `user.custom_tool_result` over the same authenticated event stream. No public endpoint is exposed; the sandbox never sees the secret. See `shared/managed-agents-client-patterns.md` → Pattern 9.
|
||||
**When vault credentials don't fit** (e.g. self-hosted sandboxes — `environment_variable` is not yet supported there), **register a custom tool:** the agent emits `agent.custom_tool_use`, your orchestrator (which already holds the credential) executes the call and returns `user.custom_tool_result` over the same authenticated event stream. No public endpoint is exposed; the sandbox never sees the secret. See `shared/managed-agents-client-patterns.md` → Pattern 9.
|
||||
|
||||
**Do not put API keys in the system prompt or user messages as a workaround** — they persist in the session's event history.
|
||||
|
||||
@@ -213,11 +216,11 @@ Vaults store credentials; those credentials **never enter the sandbox**. This is
|
||||
**Flow:**
|
||||
|
||||
1. Create a vault (`client.beta.vaults.create(...)`) — one per tenant/user, or one shared, depending on your model
|
||||
2. Add MCP credentials to it (`client.beta.vaults.credentials.create(...)`) — each credential is tied to one MCP server URL
|
||||
2. Add credentials to it (`client.beta.vaults.credentials.create(...)`) — MCP credentials are keyed by MCP server URL; environment-variable credentials by `secret_name`
|
||||
3. Reference the vault on session create via `vault_ids: ["vlt_..."]`
|
||||
4. Anthropic auto-refreshes tokens before they expire; the agent uses the current access token when calling MCP tools
|
||||
4. Anthropic auto-refreshes OAuth tokens before they expire and substitutes secrets at runtime
|
||||
|
||||
**Credential shape**:
|
||||
**MCP OAuth credential shape**:
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -249,6 +252,40 @@ Omit `refresh` entirely if you only have an access token with no refresh capabil
|
||||
|
||||
> 💡 **Getting an OAuth token.** How you obtain the initial access and refresh tokens depends on the MCP server — consult its documentation. Once you have them, store them in a vault credential using the shape above; Anthropic auto-refreshes via the `refresh.token_endpoint` from there.
|
||||
|
||||
**Environment-variable credential shape**:
|
||||
|
||||
```json
|
||||
{
|
||||
"display_name": "Twilio API key for sandbox",
|
||||
"auth": {
|
||||
"type": "environment_variable",
|
||||
"secret_name": "TWILIO_API_KEY",
|
||||
"secret_value": "sk-your-secret-here",
|
||||
"networking": {
|
||||
"type": "limited",
|
||||
"allowed_hosts": ["api.twilio.com", "*.twilio.com"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`networking.allowed_hosts` controls which outbound hosts the secret can be substituted for — `{"type": "limited", "allowed_hosts": [...]}` or `{"type": "unrestricted"}` if you can't enumerate the domains in advance. Limiting is strongly recommended: it prevents the key from ever being sent to unauthorized hosts.
|
||||
|
||||
> ⚠️ **Two networking layers, both required.** `networking.allowed_hosts` on the credential controls which requests *use the secret*, not which requests are *allowed*. The agent must also be able to reach the domain at the **environment level** (`unrestricted`, or the host listed in the environment's `allowed_hosts` — see `shared/managed-agents-environments.md`). A domain missing from either layer means the secret-substituted request fails.
|
||||
|
||||
> ⚠️ **Client-side validation caveat.** Substitution happens at egress, not inside the sandbox — clients that validate the credential *format* locally before making a network request (e.g. a CLI that checks the key starts with `sk-`) will see the opaque placeholder and may fail at startup. If a client rejects the credential before any network call, that's why.
|
||||
|
||||
> 💡 **Scope the key minimally.** The agent can do anything the key allows; a key with broader permissions than the task needs increases the blast radius if the agent behaves unexpectedly.
|
||||
|
||||
**Not supported with self-hosted sandboxes** — `environment_variable` credentials require Anthropic-managed egress. See `shared/managed-agents-self-hosted-sandboxes.md`.
|
||||
|
||||
**Constraints (all credential types):**
|
||||
|
||||
- **Unique key per vault.** `mcp_server_url` (MCP credentials) and `secret_name` (environment-variable credentials) must be unique among active credentials in a vault; duplicates return a 409.
|
||||
- **Keys are immutable.** Secret values and `display_name` can be updated (rotation); to change `mcp_server_url`, `secret_name`, `token_endpoint`, or `client_id`, archive the credential and create a new one. Archiving purges the secret and frees the key for a replacement.
|
||||
- **Maximum 20 credentials per vault.**
|
||||
- Credentials are stored as provided and **not validated until session runtime** — an invalid credential surfaces as an authentication or downstream error during the session, which is emitted but does not block the session from continuing.
|
||||
|
||||
**Scoping:** Vaults are workspace-scoped. Anyone with developer+ role in the API workspace can create, read (metadata only — secrets are write-only), and attach vaults. `vault_ids` can be set at session **create** time but not via session update (the SDK docstring says "Not yet supported; requests setting this field are rejected").
|
||||
|
||||
---
|
||||
|
||||
@@ -19,6 +19,8 @@ For the latest, authoritative version (with code samples in every supported lang
|
||||
| Opus 4.7 Migration Checklist | The required vs optional items for 4.7, tagged `[BLOCKS]` / `[TUNE]` |
|
||||
| Migrating to Opus 4.8 | Migrating to Opus 4.8 (no new breaking changes; mid-session system prompts; behavioral re-tuning) |
|
||||
| Opus 4.8 Migration Checklist | The required vs optional items for 4.8, tagged `[BLOCKS]` / `[TUNE]` |
|
||||
| Migrating to Claude Fable 5 | Migrating to Claude Fable 5 or Claude Mythos 5 (always-on protected thinking, new tokenizer, refusal handling, data retention, behavioral shifts + prompting guidance) |
|
||||
| Claude Fable 5 Migration Checklist | The required vs optional items for Claude Fable 5, tagged `[BLOCKS]` / `[TUNE]` |
|
||||
| Verify the Migration | After edits — runtime spot-check |
|
||||
|
||||
**TL;DR:** Change the model ID string. If you were using `budget_tokens`, switch to `thinking: {type: "adaptive"}`. If you were using assistant prefills, they 400 on both Opus 4.6 and Sonnet 4.6 — switch to one of the prefill replacements (most often `output_config.format`; see the table in Breaking Changes by Source Model). If you're moving from Sonnet 4.5 to Sonnet 4.6, set `effort` explicitly — 4.6 defaults to `high`. Remove the `effort-2025-11-24` and `fine-grained-tool-streaming-2025-05-14` beta headers (GA on 4.6); remove `interleaved-thinking-2025-05-14` once you're on adaptive thinking (keep it only while using the transitional `budget_tokens` escape hatch). Then drop back from `client.beta.messages.create` to `client.messages.create`. Dial back any aggressive "CRITICAL: YOU MUST" tool instructions; 4.6 follows the system prompt much more closely.
|
||||
@@ -178,7 +180,8 @@ If you're applying several prompt-tuning edits at once, offer them as a short li
|
||||
|
||||
| If you're on… | Migrate to | Why |
|
||||
| ------------------------------------- | ------------------ | ------------------------------------------------- |
|
||||
| Opus 4.7 | `claude-opus-4-8` | Most capable model; same API surface as 4.7 (no new breaking changes) — mostly prompt re-tuning; see Migrating to Opus 4.8 |
|
||||
| Claude Mythos Preview (`claude-mythos-preview`) | `claude-mythos-5` (Project Glasswing successor) or `claude-fable-5` (GA) | Same tokenizer family — mostly a model-ID swap; remove `thinking` config and prefill; see Migrating to Claude Fable 5 |
|
||||
| Opus 4.7 | `claude-opus-4-8` | Most capable Opus-tier model; same API surface as 4.7 (no new breaking changes) — mostly prompt re-tuning; see Migrating to Opus 4.8 |
|
||||
| Opus 4.6 | `claude-opus-4-8` | Apply the Opus 4.7 breaking changes, then the 4.8 re-tuning |
|
||||
| Opus 4.0 / 4.1 / 4.5 / Opus 3 | `claude-opus-4-8` | Apply 4.6 → 4.7 → 4.8 in order (adaptive thinking, drop sampling params, then re-tune) |
|
||||
| Sonnet 4.0 / 4.5 / 3.7 / 3.5 | `claude-sonnet-4-6`| Best speed / intelligence balance; adaptive thinking; 64K output |
|
||||
@@ -481,6 +484,7 @@ If the model is now overtriggering a tool or skill, the fix is almost always to
|
||||
| `claude-opus-4-5` | `claude-opus-4-8` |
|
||||
| `claude-opus-4-1` | `claude-opus-4-8` |
|
||||
| `claude-opus-4-0` | `claude-opus-4-8` |
|
||||
| `claude-mythos-preview` | `claude-mythos-5` (Project Glasswing) or `claude-fable-5` |
|
||||
| `claude-sonnet-4-5` | `claude-sonnet-4-6`|
|
||||
| `claude-sonnet-4-0` | `claude-sonnet-4-6`|
|
||||
|
||||
@@ -790,7 +794,7 @@ Every item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, sil
|
||||
|
||||
> **Model ID `claude-opus-4-8` is authoritative as written here.** When the user asks to migrate to Opus 4.8, write `model="claude-opus-4-8"` exactly. Do **not** WebFetch to verify — this guide is the source of truth for migration target IDs. The corresponding entry exists in `shared/models.md`.
|
||||
|
||||
Claude Opus 4.8 is our most capable generally available model to date — highly autonomous, with state-of-the-art long-horizon agentic execution, knowledge work, and memory. It is layered on top of the Opus 4.7 migration above. If the caller is jumping from Opus 4.6 or older, apply the 4.6 and 4.7 sections first, then this one.
|
||||
Claude Opus 4.8 is our most capable Opus-tier model — highly autonomous, with state-of-the-art long-horizon agentic execution, knowledge work, and memory. It is layered on top of the Opus 4.7 migration above. If the caller is jumping from Opus 4.6 or older, apply the 4.6 and 4.7 sections first, then this one.
|
||||
|
||||
**No new breaking changes.** Opus 4.8 keeps the same request surface as Opus 4.7. The same calls that already work on 4.7 work unchanged on 4.8 — adaptive thinking only (`thinking: {type: "enabled", budget_tokens: N}` still 400s; use `{type: "adaptive"}`), sampling parameters (`temperature`, `top_p`, `top_k`) still rejected, last-assistant-turn prefills still 400, `thinking.display` still defaults to `"omitted"`, and the `low`/`medium`/`high`/`xhigh`/`max` effort levels, Task Budgets (beta), and high-resolution vision all behave as on 4.7. A 4.7 → 4.8 migration is therefore **the model-ID swap plus prompt re-tuning** — there is no required code edit beyond the model string.
|
||||
|
||||
@@ -882,9 +886,237 @@ For a caller **already on Opus 4.7**, only the first item is required; everythin
|
||||
|
||||
---
|
||||
|
||||
## Migrating to Claude Fable 5
|
||||
|
||||
> **Model IDs `claude-fable-5` and `claude-mythos-5` are authoritative as written here.** When the user asks to migrate to Claude Fable 5, write `model="claude-fable-5"` exactly; a Mythos Preview migrator in Project Glasswing writes `model="claude-mythos-5"` (everyone else: `claude-fable-5`). Do **not** WebFetch to verify — this guide is the source of truth for migration target IDs. The corresponding entries exist in `shared/models.md`.
|
||||
|
||||
Claude Fable 5 is Anthropic's most capable widely released model — for the most demanding reasoning and long-horizon agentic work. **Claude Mythos 5** (`claude-mythos-5`) offers the same capabilities, pricing, and API behavior through Project Glasswing (participation is the only way to access it), and succeeds the invitation-only **Claude Mythos Preview** (`claude-mythos-preview`). Everything in this section applies to both models — only the ID differs. Mythos Preview migrators in Project Glasswing target `claude-mythos-5`; everyone else targets `claude-fable-5`. 1M token context window by default (the maximum is also the default), up to 128K output tokens per request.
|
||||
|
||||
**Migrate to Claude Fable 5 only when the user explicitly chose it.** It is not the default Opus upgrade path — pricing is above Opus-tier and the new tokenizer changes cost baselines. For "upgrade to the latest model" requests, the target remains `claude-opus-4-8`.
|
||||
|
||||
### Breaking changes (vs Opus-tier and Mythos Preview)
|
||||
|
||||
1. **Thinking is always on — remove all `thinking` configuration.** Adaptive thinking applies automatically whenever the `thinking` parameter is unset (an explicit `{type: "adaptive"}` is also accepted). Any other configuration is rejected: `thinking: {type: "disabled"}` and `{type: "enabled", budget_tokens: N}` both return a 400. `budget_tokens` has no replacement — the `output_config.effort` parameter is a separate output-level control, not a thinking budget.
|
||||
|
||||
```python
|
||||
# Before (Mythos Preview / older models)
|
||||
client.messages.create(
|
||||
model="claude-mythos-preview",
|
||||
max_tokens=16000,
|
||||
thinking={"type": "enabled", "budget_tokens": 10000},
|
||||
messages=[...],
|
||||
)
|
||||
|
||||
# After (Claude Fable 5) — no thinking field at all
|
||||
client.messages.create(
|
||||
model="claude-fable-5",
|
||||
max_tokens=16000,
|
||||
output_config={"effort": "high"},
|
||||
messages=[...],
|
||||
)
|
||||
```
|
||||
|
||||
2. **Assistant prefill is not supported.** Replace last-assistant-turn prefills with structured outputs (`output_config.format`) or system prompt instructions — same replacement patterns as the 4.6-family prefill removal above. (One exception: the fallback-credit prefill claim — the server accepts the echoed assistant message when redeeming a credit; see the refusal section below.)
|
||||
|
||||
3. **Interleaved scratchpad is not supported** (Mythos Preview migrators only). Inter-tool reasoning is returned in thinking blocks instead, which adaptive thinking produces automatically between tool calls.
|
||||
|
||||
### Protected thinking — always encrypted, model-specific
|
||||
|
||||
Claude Fable 5's `protected_thinking` policy protects the **raw chain of thought** — it is never exposed in responses. What you receive are **regular `thinking` blocks**, not encrypted blobs or `redacted_thinking`: `display: "summarized"` returns a readable summary of the reasoning, and with `"omitted"` — the default, same as Opus 4.8/4.7 — responses still include `thinking` blocks but the `thinking` field is an empty string. `display` controls visibility only; thinking happens and is billed the same under every setting. What's stricter on Claude Fable 5 is **replay**: pass thinking blocks back to the API **unchanged** when continuing a conversation on the same model (the standard multi-turn pattern; dropping or editing them breaks the turn).
|
||||
|
||||
When continuing on the same model, pass each thinking block back **exactly as received — including blocks whose `thinking` text is empty**. The API rejects blocks whose content has been *modified*, not blocks you have read; displaying the summary is fine, editing or reconstructing blocks is not.
|
||||
|
||||
Thinking blocks are tied to the model that produced them, but cross-model replay is forgiving: other models **silently ignore** them rather than rejecting the request (early-access builds returned `invalid_request_error`; that was reverted before launch). Ignored blocks still bill input tokens, though — so when switching models for good, e.g. after a classifier refusal, strip `thinking`/`redacted_thinking` blocks from prior assistant turns to avoid paying for dead weight. Two exceptions: fallback-credit retries must echo the refused body **unchanged**, and `fallback` blocks from a mid-output fallback stay where they appeared.
|
||||
|
||||
Related: a request that tries to elicit the model's internal reasoning *in the response text* can be refused with `stop_details.category: "reasoning_extraction"` — applications needing reasoning visibility should read the summarized `thinking` blocks instead of prompting for reasoning.
|
||||
|
||||
### New tokenizer — re-baseline tokens and cost
|
||||
|
||||
Claude Fable 5 uses a new tokenizer. The same content tokenizes to **roughly 30% more tokens** than on Opus-tier and older models (varies by content and workload shape). Billing is per token, so an unchanged workload can cost more after migration even before the per-token price difference.
|
||||
|
||||
- Coming **from `claude-mythos-preview`**: token counts are roughly unchanged (same tokenizer family).
|
||||
- Coming **from Opus/Sonnet/Haiku**: do not reuse token counts, context-window budgets, or `max_tokens` settings measured on the old model.
|
||||
|
||||
The token counting endpoint returns counts under **both** tokenizers when you pass `model: "claude-fable-5"` — `input_tokens` (new tokenizer, what you're billed) plus `input_tokens_prior_tokenizer` (the same request under the prior-generation tokenizer) — so you can measure the delta on your own prompts before switching.
|
||||
|
||||
### `refusal` stop reason — handle before reading content
|
||||
|
||||
Claude Fable 5 runs safety classifiers on incoming requests, targeting research biology and most cybersecurity content (Claude Fable 5 is not intended for those domains); benign adjacent work — security tooling, life-sciences tasks — can occasionally trigger false positives, which is why the fallback patterns below matter even for legitimate workloads. (Most Claude consumer surfaces ship with built-in Opus 4.8 fallbacks; API callers configure their own.) A declined request returns a **successful HTTP 200** with `stop_reason: "refusal"`, plus a `stop_details` object with the policy category (`"cyber"`, `"bio"`, `"reasoning_extraction"`, or `null` — treat `null` as a permanent valid state). **Branch on `stop_reason`, never on `stop_details`** — `stop_details` is informational and can be `null` even on a refusal, and `explanation` is not guaranteed present. Note that classifier blocks and ordinary model refusals (the model itself declining) both surface as `stop_reason: "refusal"`; `stop_details.category` tells you which class you're handling, and therefore whether retrying on a fallback model is the right response. The classifier can fire **before any output** (empty `content` array; not billed at all — no input or output tokens, no rate-limit consumption) or **mid-stream** after partial output (already-streamed output is billed at normal rates — discard the partial output rather than treating it as complete). Code that reads `response.content[0]` unconditionally will break — check `stop_reason` first:
|
||||
|
||||
```python
|
||||
response = client.messages.create(model="claude-fable-5", max_tokens=1024, messages=[...])
|
||||
if response.stop_reason == "refusal":
|
||||
# classifiers declined; content is empty (pre-output) or partial (mid-stream)
|
||||
handle_refusal()
|
||||
else:
|
||||
print(response.content[0].text)
|
||||
```
|
||||
|
||||
Three ways to retry a refused request on another model, in order of preference:
|
||||
|
||||
**1. Server-side `fallbacks` parameter (beta: Claude API and Claude Platform on AWS) — preferred.** One round trip, a plain client, no client-side logic. Name substitute models (the only supported fallback target at launch is `claude-opus-4-8`, expansion expected); on a policy decline the API runs the next model on the same request and returns its answer, with credit-style repricing applied automatically. A `stop_reason: "refusal"` on the final response means the whole chain refused.
|
||||
|
||||
```python
|
||||
response = client.beta.messages.create(
|
||||
model="claude-fable-5",
|
||||
max_tokens=1024,
|
||||
betas=["server-side-fallback-2026-06-01"],
|
||||
fallbacks=[{"model": "claude-opus-4-8"}],
|
||||
messages=[{"role": "user", "content": "Hello, Claude"}],
|
||||
)
|
||||
|
||||
# Switch points: one fallback block per model that ran and declined this turn
|
||||
for block in response.content:
|
||||
if block.type == "fallback":
|
||||
print(f"{block.from_.model} declined; {block.to.model} continued")
|
||||
|
||||
# Served-by signal: covers every fallback-served turn, INCLUDING sticky turns
|
||||
# (sticky-served turns carry no fallback block — nothing declined this turn)
|
||||
iterations = getattr(response.usage, "iterations", None) or []
|
||||
if any(entry.type == "fallback_message" for entry in iterations):
|
||||
print(f"Served by {response.model}")
|
||||
```
|
||||
|
||||
Key semantics:
|
||||
|
||||
- **Header must be exactly `server-side-fallback-2026-06-01`** — other `server-side-fallback-*` values reject the `fallbacks` param with a 400. The current header carries the *earliest* date of the series (`-2026-06-09` and `-2026-06-02` were earlier previews) — do not "correct" it to a newer-looking date. Rejected on the Batches API; not available on Bedrock/Vertex (use pattern 2 there — the SDK middleware). Entries may override `max_tokens` per hop (bounding that attempt's own output independently of the top-level `max_tokens`); `thinking`, `output_config`, and `speed` overrides are rolling out (`speed` additionally requires its beta) — until your requests accept them, include only `model` and `max_tokens` in each entry. Entries must be distinct and must be in the requested model's `allowed_fallback_models` (visible on `/v1/models` under the beta). The request *with an entry's overrides merged in* must be valid as a direct request to that entry's model.
|
||||
- **Triggers on policy declines only** — rate limits, overloads, and server errors on the requested model are returned as-is, never falling back.
|
||||
- **Reading the response:** a `fallback` content block (`{"type": "fallback", "from": {"model": ...}, "to": {"model": ...}}`) marks each switch point in `content`; the served-by signal is a `fallback_message` entry in `usage.iterations` (don't rely on the block — sticky-served turns have none). Top-level `model` names the model that produced the message.
|
||||
- **Billing:** `usage.iterations` is the per-attempt source of truth; top-level `usage` covers only the attempt that produced the returned message. Declined-before-output attempts are reported but not billed; fallback attempts bill at the fallback model's rates. Each attempt claims the rate limits of the model that ran it — if the fallback model is rate-limited or overloaded, the refusal is returned instead with `stop_details.recommended_model` naming the canonical model ID to retry directly (populated only when the request included `fallbacks` and the attempt couldn't be made) — size fallback-model limits for expected refusal volume.
|
||||
- **Sticky routing:** once a conversation falls back, later non-streaming requests with `fallbacks` are served directly by the fallback model for ~1 hour (best-effort; org-scoped content-hash record, not message content; not recorded for ZDR orgs). Handle the requested model being tried again at any time.
|
||||
- **Echoing fallback turns back:** after a mid-output fallback, omit `thinking`, `redacted_thinking`, and `tool_use` blocks — plus any `server_tool_use` block without its matching `server_tool_result`, and any other unrecognized model-internal block type — that appear *before* the final `fallback` block; text blocks, paired server-tool blocks, and everything after the boundary echo normally. The `fallback` block itself is an ignored audit marker (keep or drop). Streaming: the retry happens on the same stream and already-received content is never invalidated — a pre-output block is seamless (`message_start` names the fallback model; the `fallback` block arrives as an ordinary `content_block_start`, first in `content` — there is no special SSE event type; note `message_start` arrives only after the declined attempt, so time-to-first-byte includes it), and a mid-stream block keeps the partial, marks the boundary with the block, and continues — only the partial's `text` blocks are passed to the fallback model as continuation context (other block types stay in `content` but aren't part of it). Sticky routing is **not consulted on streaming requests** in the initial release, so on streams the `fallback` block check is the complete signal; non-streaming mid-output declines omit the declined partial entirely.
|
||||
|
||||
**2. SDK client-side middleware — for providers without server-side fallbacks (Bedrock, Vertex).** Register it on the client and every `client.beta.messages` request (streaming included) retries refusals automatically, splicing the fallback model's events onto the open stream in the same wire shape as pattern 1 (a `fallback` content block at each boundary, per-hop `usage.iterations`). It is also a beta surface: the middleware sends the `fallback-credit-2026-06-01` header by default so retries are repriced via credit tokens (override with its `betas` option). `BetaFallbackState` pins follow-up turns to the model that accepted (the client-side analog of sticky routing) — reuse one state object per conversation:
|
||||
|
||||
```python
|
||||
from anthropic import Anthropic, BetaFallbackState, BetaRefusalFallbackMiddleware
|
||||
|
||||
client = Anthropic(middleware=[BetaRefusalFallbackMiddleware([{"model": "claude-opus-4-8"}])])
|
||||
state = BetaFallbackState() # pins follow-ups to the model that accepted
|
||||
with state:
|
||||
response = client.beta.messages.create(model="claude-fable-5", max_tokens=1024, messages=messages)
|
||||
```
|
||||
|
||||
Create **one state per conversation** — it is the pinning scope; sharing one across conversations pins unrelated threads together, and a conversation without a state is never pinned. Per-language naming (from the GA SDK examples — don't improvise):
|
||||
|
||||
- **TypeScript**: `betaRefusalFallbackMiddleware([...])` in the client's `middleware` array; pass `{ fallbackState: state }` (a `BetaFallbackState`) as a request option.
|
||||
- **Go**: `option.WithMiddleware(betafallback.BetaRefusalFallbackMiddleware([]anthropic.BetaFallbackParam{{Model: ...}}))` (package `lib/betafallback`); state via `betafallback.WithBetaFallbackState(&betafallback.BetaFallbackState{})` passed as a request option. Server-side equivalents: `Fallbacks: []anthropic.BetaFallbackParam{...}` + `anthropic.AnthropicBetaServerSideFallback2026_06_01`.
|
||||
- **C#**: it's a *handler* — `new AnthropicClient { Handlers = [new BetaRefusalFallbackHandler { Fallbacks = [new(Model.ClaudeOpus4_8)] }] }` (namespace `Anthropic.Helpers`); state via `BetaFallbackState.Create()` scoped per call with `using (fallbackState.Use()) { ... }`. Server-side equivalents: `Fallbacks = [new(Model.ClaudeOpus4_8)]` + `AnthropicBeta.ServerSideFallback2026_06_01`.
|
||||
|
||||
For languages not listed (Java, Ruby, PHP) — or for a full runnable program in any language — each public SDK repo ships a fallbacks example under `examples/` (e.g. `examples/fallbacks.py`, `examples/refusal-fallback/`): WebFetch the repo from `shared/live-sources.md` § SDK Repositories rather than improvising the binding.
|
||||
|
||||
**3. Hand-rolled retry + fallback credit (raw HTTP, or SDKs without the middleware).** Detect the refusal via `stop_reason` and re-send the conversation as-is on a model with broader availability such as `claude-opus-4-8` (Claude Fable 5's protected thinking blocks are silently ignored by other models — no stripping required); keep using the fallback model for subsequent turns. **Fallback credit** (beta: Claude API, Bedrock, Vertex) makes those retries cheaper. Prompt caches are per-model, so a plain retry pays cold cache-writes on the new model. With the `fallback-credit-2026-06-01` beta header (send it on both the original request and the retry), a refusal's `stop_details` carries `fallback_credit_token` (opaque; `null` when unavailable) and `fallback_has_prefill_claim`. Echo the token as the top-level `fallback_credit_token` request parameter on the retry (typed in the GA SDKs; on a pre-GA SDK pass it via `extra_body`) and the previously-cached span bills at cache-read rates — the retry costs what it would have if the conversation had been on that model all along. Rules: the retry body must match the refused request **exactly** in every prompt-shaping field (`system`, `messages`, `tools`, `tool_choice`, `thinking` — do **not** strip thinking blocks when redeeming a credit — the server handles them); the retry model must be in the refused model's `allowed_fallback_models`; the token expires in 5 minutes; Batches results carry no tokens. If `fallback_has_prefill_claim` is `true`, append one assistant message echoing the refused response's `content` — the retry model continues from where the refused model stopped (and completed server-tool work isn't re-run). When echoing, strip trailing whitespace from a final `text` block (the prefill validator rejects it; the credit match tolerates that edit), after omitting any unpaired `tool_use` blocks. On a 400, fall back to the unchanged body with the token; on a 400 naming `fallback_credit_token`, retry without it (credit forfeited).
|
||||
|
||||
**Migrating code built on the v1 preview.** If the code you're editing carries any of these markers, it targets the discontinued early-access surface — migrate it to the v2 shapes above, and ship the header and parameter changes together (the v1 parameter shape under the v2 header is a 400):
|
||||
|
||||
| v1 marker (replace) | v2 |
|
||||
|---|---|
|
||||
| `server-side-fallback-2026-06-09` / `-2026-06-02` header | `server-side-fallback-2026-06-01` |
|
||||
| `fallback: {model, on_partial}` single object | `fallbacks: [{model, ...}]` array (1–3); `on_partial` no longer exists — partial-output behavior is fixed (streams keep the partial; non-streaming omits it). Unknown keys in an entry are a 400 |
|
||||
| Top-level `response.fallback` object (`from_model`, `reason`) | Never emitted — read `fallback` content blocks (switch points, no `reason` field) and `usage.iterations` (served-by) |
|
||||
| `event: fallback` SSE with discard indices | No dedicated event; streamed content is never invalidated — the switch arrives as an ordinary `content_block_start`/`stop` pair of type `fallback` |
|
||||
| `fallback_primary` / `fallback_retry` iteration types | Blocked attempts are plain `message` entries; the serving attempt is `fallback_message` |
|
||||
| `reason: "sticky"` | No reason field — sticky turns carry no block; detect via `fallback_message` in `usage.iterations` + `response.model` |
|
||||
| `recommended_model` meaning "primary served the refusal" | Now populated only when the fallback attempt *couldn't run* (rate-limited/overloaded) — its presence means a direct retry on that model may succeed, not that it refused too |
|
||||
|
||||
### Data retention requirement
|
||||
|
||||
Claude Fable 5 requires **30-day data retention** and is not available under zero data retention. Requests from an organization whose data-retention configuration doesn't meet the requirement return `400 invalid_request_error` — if a migration suddenly 400s with no obvious request problem, check the org's retention configuration before debugging the payload. On Amazon Bedrock, Google Vertex AI, and Microsoft Foundry, data-retention requirements are set by each platform.
|
||||
|
||||
### What carries over unchanged
|
||||
|
||||
Same Messages API and tool-use patterns as Opus-tier and Mythos Preview. Supported at launch: `output_config.effort` (`low`/`medium`/`high`/`xhigh`/`max`), Task Budgets (beta, `task-budgets-2026-03-13` header), compaction (beta, `compact-2026-01-12` header), the memory tool, tool-call clearing via context editing, and high-resolution vision (no downscaling cap, as on Opus 4.7+).
|
||||
|
||||
### Behavioral shifts (prompt-tunable)
|
||||
|
||||
None of these are API-breaking, but they're where migrated workloads feel different. Claude Fable 5's biggest gains are on work *above* what prior models could do (long-horizon autonomous runs, first-shot implementations of well-specified systems, end-to-end enterprise deliverables — financial analysis, spreadsheets, slides, docs — code review/debugging and repository-history search, vision on dense or degraded images — it's explicitly trained to use bash and crop tools on flipped/blurry/noisy inputs — navigating ambiguity, parallel sub-agent delegation and collaboration — it reliably sustains ongoing communications with long-running sub-agents and peer agents; note bug-finding gains exclude security-focused analysis, where the cyber classifiers apply) — don't evaluate it only on workloads older models already handled.
|
||||
|
||||
**Longer turns by default — the biggest structural shift.** Individual requests on hard tasks can run many minutes at higher effort (a 15-minute single request is normal when the task involves gathering context, building, and self-verifying). Before migrating, plan timeouts, streaming, and user-facing progress indicators; structure work so callers check in on runs asynchronously rather than blocking inside one request. On ambiguous tasks Claude Fable 5 may need a small nudge to avoid overplanning:
|
||||
|
||||
> When you have enough information to act, act. Do not re-derive facts already established in the conversation, re-litigate a decision the user has already made, or narrate options you will not pursue in user-facing messages. If you are weighing a choice, give a recommendation, not an exhaustive survey. This does not apply to thinking blocks.
|
||||
|
||||
**Consider all effort levels.** `output_config.effort` is the primary intelligence/latency/cost control. Recommended defaults: `high` for most tasks, `xhigh` for the most capability-sensitive workloads, `medium`/`low` for routine work. Lower effort settings — including `low` — still perform very well on Claude Fable 5, often exceeding the `xhigh` or even `max` performance of previous models. Reduce effort if a task completes correctly but takes longer than necessary, or for a quicker interactive working style. At higher effort on routine work, Claude Fable 5 can gather context and deliberate beyond what the task needs (the flip side: higher effort buys excellent verification behavior and the most rigorous outputs). To prevent unrequested tidying or refactoring at higher effort:
|
||||
|
||||
> Don't add features, refactor, or introduce abstractions beyond what the task requires. A bug fix doesn't need surrounding cleanup and a one-shot operation usually doesn't need a helper. Don't design for hypothetical future requirements - do the simplest thing that works well. Avoid premature abstraction. Avoid half-finished implementations either. Don't add error handling, fallbacks, or validation for scenarios that cannot happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). Don't use feature flags or backwards-compatibility shims when you can just change the code.
|
||||
|
||||
**Instruction following is strong — use it.** Claude Fable 5 is very responsive to explicit communication-style sections in system prompts; invest in them rather than fighting output style downstream. Un-steered — especially at higher effort — it can elaborate beyond what the task needs: heavily-structured PR descriptions, sections on alternatives that weren't chosen, comments narrating what the next line does. You don't need to enumerate these behaviors by name; a brief instruction is just as effective:
|
||||
|
||||
> Lead with the outcome. Your first sentence after finishing should answer "what happened" or "what did you find" — the thing the user would ask for if they said "just give me the TLDR." Supporting detail and reasoning come after. Being readable and being concise are different things, and readability matters more. The way to keep output short is to be selective about what you include (drop details that don't change what the reader would do next), not to compress the writing into fragments, abbreviations, arrow chains like A → B → fails, or jargon.
|
||||
|
||||
**Ground progress claims on long runs.** Require progress claims to be audited against tool results — in testing this nearly eliminated fabricated status reports on tasks designed to elicit them:
|
||||
|
||||
> Before reporting progress, audit each claim against a tool result from this session. Only report work you can point to evidence for; if something is not yet verified, say so explicitly. Report outcomes faithfully: if tests fail, say so with the output; if a step was skipped, say that; when something is done and verified, state it plainly without hedging.
|
||||
|
||||
**State boundaries explicitly.** Claude Fable 5 sometimes takes unrequested-but-adjacent actions (e.g. composing an email straight to drafts, creating backup git branches). Define what it should *not* do:
|
||||
|
||||
> When the user is describing a problem, asking a question, or thinking out loud rather than requesting a change, the deliverable is your assessment. Report your findings and stop. Don't apply a fix until they ask for one. Before running a command that changes system state — restarts, deletes, config edits — check that the evidence actually supports that specific action. A signal that pattern-matches to a known failure may have a different cause.
|
||||
|
||||
**Let it delegate — asynchronously.** Parallel sub-agents are dependable on Claude Fable 5 — instead of suppressing delegation (a common prior-model guardrail), use sub-agents frequently and give explicit guidance on *when* delegation is desirable. Sub-agents that communicate **asynchronously** with the orchestrator outperform spawn-and-block: long-lived agents keep their context instead of re-establishing it per subtask (cache-read savings), the orchestrator isn't bottlenecked on the slowest sub-agent, and context persists across subtasks.
|
||||
|
||||
> Delegate independent subtasks to sub-agents and keep working while they run. Intervene if a sub-agent goes off track or is missing relevant context.
|
||||
|
||||
**Give it a memory surface.** Claude Fable 5 performs notably better when it can write learnings somewhere for future reference — even a plain `.md` file. Tell it where, tell it to consult that file in future sessions, and give it a format:
|
||||
|
||||
> Store one lesson per file with a one-line summary at the top. Record corrections and confirmed approaches alike, including why they mattered. Don't save what the repo or chat history already records; update an existing note rather than creating a duplicate; delete notes that turn out to be wrong.
|
||||
|
||||
**Rare: early stopping.** Deep into long sessions it can occasionally end a turn with a text-only statement of intent ("I'll now run X") without the tool call, or ask permission it doesn't need. A "continue" recovers it interactively; for autonomous pipelines add a system reminder:
|
||||
|
||||
> You are operating autonomously. The user is not watching in real time and cannot answer questions mid-task, so asking 'Want me to…?' or 'Shall I…?' will block the work. For reversible actions that follow from the original request, proceed without asking. Offering follow-ups after the task is done is fine; asking permission after already discussing with the user before doing the work is not. Before ending your turn, check your last paragraph. If it is a plan, an analysis, a question, a list of next steps, or a promise about work you have not done ('I'll…', 'let me know when…'), do that work now with tool calls. End your turn only when the task is complete or you are blocked on input only the user can provide.
|
||||
|
||||
**Rare: context anxiety.** In very long sessions it can worry about running out of context — suggesting a new session or trimming its own work — most often when the harness surfaces a remaining-token countdown. Avoid showing explicit context-budget counts; if you must:
|
||||
|
||||
> You have ample context remaining. Do not stop, summarize, or suggest a new session on account of context limits – continue the work.
|
||||
|
||||
**Give the reason, not just the request.** Claude Fable 5 performs better when it understands the intent behind a request — it connects the task to relevant information rather than inferring intent on its own. This matters most for long-running agents juggling context from disparate workstreams:
|
||||
|
||||
> I'm working on [the larger task] for [who it's for]. They need [what the output enables]. With that in mind: [request].
|
||||
|
||||
**Readability in long agentic sessions.** Deep into extended conversations (many tool calls, large working context) Claude Fable 5 can produce text users find hard to follow — dense arrow-chain shorthand, implementation-level detail, references to thinking the user never saw. A communication-style addendum strongly mitigates this; adapt:
|
||||
|
||||
> Terse shorthand is fine between tool calls (that's you thinking out loud, and brevity there is good). Your final summary is different: it's for a reader who didn't see any of that. If you've been working for a while without the user watching - overnight, across many tool calls, since they last spoke - your final message is their first look at any of it. Write it as a re-grounding, not a continuation of your working thread: the outcome first, then the one or two things you need from them, each explained as if new. The vocabulary you built up while working is yours, not theirs; leave it behind unless you re-introduce it. When you write the summary at the end, drop the working shorthand. Write complete sentences. Spell out terms instead of abbreviating them. Don't use arrow chains, hyphen-stacked compounds, or labels you made up earlier — the reader doesn't have the context to decode them. When you mention files, commits, flags, or other identifiers, give each one its own plain-language clause saying what it is or what changed — never pack several into one parenthesized run or slash-separated list. Open with the outcome: one sentence on what happened or what you found. Then the supporting detail. If you have to choose between short and clear, choose clear.
|
||||
|
||||
### Long-running agent recommendations
|
||||
|
||||
- **Make self-verification explicit.** For long-running builds, instruct it to establish and run its own checking harness on a cadence ("Establish a method for checking your own work as you build; run it every [interval], verifying against the specification with sub-agents"). Separate fresh-context verifier sub-agents tend to outperform self-critique.
|
||||
- **De-prescribe migrated prompts and skills.** Prompts and skills written for prior models are often too prescriptive for Claude Fable 5 and *reduce* output quality. After migrating, A/B the workload with older step-by-step scaffolding removed — prefer stating the goal and constraints over enumerating the steps. Claude Fable 5 is also good at updating skills on the fly from what it learns mid-task — let it.
|
||||
- **Start at the top of your difficulty range.** The teams with the best early-access outcomes gave it their hardest unsolved problems first — have it scope the problem, ask questions, then execute.
|
||||
- **Add a `send_to_user` tool for verbatim mid-task delivery.** When an asynchronous agent must deliver something the user sees *exactly as written* mid-run (a deliverable, a progress update with specific numbers, a direct answer), give it a client-side tool whose input you render directly in the UI — tool inputs are never summarized, so content arrives intact. Return a simple acknowledgement as the tool result:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "send_to_user",
|
||||
"description": "Display a message directly to the user. Use this for progress updates, partial results, or content the user must see exactly as written before the task finishes.",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"message": { "type": "string", "description": "The content to display to the user." }
|
||||
},
|
||||
"required": ["message"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For agents that only narrate routine progress, default summaries are typically adequate without this tool.
|
||||
|
||||
### Claude Fable 5 Migration Checklist
|
||||
|
||||
- [ ] **[BLOCKS]** Update the `model=` string to `claude-fable-5` (`claude-mythos-5` for Mythos Preview migrators in Project Glasswing)
|
||||
- [ ] **[BLOCKS]** Remove `thinking: {type: "disabled"}` (errors on Claude Fable 5)
|
||||
- [ ] **[BLOCKS]** Replace assistant prefill with structured outputs or system prompt instructions
|
||||
- [ ] **[BLOCKS]** Confirm the org meets the 30-day data-retention requirement (ZDR orgs get `400 invalid_request_error` on every request)
|
||||
- [ ] **[BLOCKS]** Remove all other `thinking` configuration (`{type: "enabled", budget_tokens: N}` returns a 400, same as on Opus 4.7/4.8); control depth with `output_config.effort` instead
|
||||
- [ ] **[TUNE]** Re-baseline token counts, context budgets, `max_tokens`, and cost — ~30% more tokens vs Opus-tier (roughly unchanged from Mythos Preview); use `count_tokens` with `model: "claude-fable-5"` to measure
|
||||
- [ ] **[TUNE]** Add `stop_reason == "refusal"` handling before reading `response.content` (pre-output: empty + unbilled; mid-stream: partial output billed — discard); pick a retry strategy — client-side (replay history as-is; other models ignore Fable's thinking blocks), fallback credit (`fallback-credit-2026-06-01`, exact body), or server-side `fallbacks` (`server-side-fallback-2026-06-01`, Claude API and Claude Platform on AWS)
|
||||
- [ ] **[TUNE]** If you surfaced thinking text to users, plan for protected (encrypted) thinking — pass blocks back unchanged on the same model, never render or cross-model them
|
||||
- [ ] **[TUNE]** Plan for minutes-long turns: timeouts, streaming, async check-ins, progress UX (see Behavior changes above)
|
||||
- [ ] **[TUNE]** Run an effort sweep including low/medium for routine workloads; add the no-tidying instruction if higher effort produces unrequested refactors
|
||||
- [ ] **[TUNE]** A/B with prior-model scaffolding removed — over-prescriptive prompts/skills reduce Claude Fable 5 output quality
|
||||
|
||||
---
|
||||
|
||||
## Verify the Migration
|
||||
|
||||
After updating, spot-check that the new model is actually being used. Replace `YOUR_TARGET_MODEL` with the model string you migrated to (e.g. `claude-opus-4-8`, `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) and keep the assertion prefix in sync:
|
||||
After updating, spot-check that the new model is actually being used. Replace `YOUR_TARGET_MODEL` with the model string you migrated to (e.g. `claude-fable-5`, `claude-opus-4-8`, `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) and keep the assertion prefix in sync:
|
||||
|
||||
```python
|
||||
YOUR_TARGET_MODEL = "claude-opus-4-8" # or "claude-opus-4-7", "claude-sonnet-4-6", "claude-haiku-4-5"
|
||||
|
||||
@@ -57,6 +57,8 @@ curl https://api.anthropic.com/v1/models/claude-opus-4-8 \
|
||||
|
||||
| Friendly Name | Alias (use this) | Full ID | Context | Max Output | Status |
|
||||
|-------------------|---------------------|-------------------------------|----------------|------------|--------|
|
||||
| Claude Fable 5 | `claude-fable-5` | — | 1M | 128K | Active |
|
||||
| Claude Mythos 5 | `claude-mythos-5` | — | 1M | 128K | Active (Project Glasswing only) |
|
||||
| Claude Opus 4.8 | `claude-opus-4-8` | — | 1M | 128K | Active |
|
||||
| Claude Opus 4.7 | `claude-opus-4-7` | — | 1M | 128K | Active |
|
||||
| Claude Opus 4.6 | `claude-opus-4-6` | — | 1M | 128K | Active |
|
||||
@@ -64,7 +66,9 @@ curl https://api.anthropic.com/v1/models/claude-opus-4-8 \
|
||||
| Claude Haiku 4.5 | `claude-haiku-4-5` | `claude-haiku-4-5-20251001` | 200K | 64K | Active |
|
||||
|
||||
### Model Descriptions
|
||||
- **Claude Opus 4.8** — The most capable Claude model to date — highly autonomous, state-of-the-art on long-horizon agentic work, knowledge work, and memory; clearer, warmer writing. Same API surface as Opus 4.7 (adaptive thinking only; sampling parameters and `budget_tokens` removed). 1M context window at standard API pricing (no long-context premium). See `shared/model-migration.md` → Migrating to Opus 4.8 — a 4.7 → 4.8 move is a model-ID swap plus prompt re-tuning, no new breaking changes.
|
||||
- **Claude Fable 5** — Anthropic's most capable widely released model, for the most demanding reasoning and long-horizon agentic work. Same API surface as Opus 4.7/4.8 with one new breaking change: an explicit `thinking: {type: "disabled"}` returns a 400 — omit the `thinking` parameter instead (thinking is always on, returned in protected/encrypted form). New tokenizer (~30% more tokens than Opus-tier for the same content). Safety classifiers may return `stop_reason: "refusal"`. No assistant prefill. Requires 30-day data retention (not available under ZDR). $10/$50 per MTok; 1M context window (default), 128K max output. See `shared/model-migration.md` → Migrating to Claude Fable 5.
|
||||
- **Claude Mythos 5** — Same capabilities, pricing, limits, and API behavior as Claude Fable 5; only the model ID differs. Available exclusively through Project Glasswing, where it joins (and succeeds) the invitation-only Claude Mythos Preview (`claude-mythos-preview`). Use it only when the org participates in Project Glasswing; otherwise use claude-fable-5.
|
||||
- **Claude Opus 4.8** — The most capable Opus-tier model — highly autonomous, state-of-the-art on long-horizon agentic work, knowledge work, and memory; clearer, warmer writing. Same API surface as Opus 4.7 (adaptive thinking only; sampling parameters and `budget_tokens` removed). 1M context window at standard API pricing (no long-context premium). See `shared/model-migration.md` → Migrating to Opus 4.8 — a 4.7 → 4.8 move is a model-ID swap plus prompt re-tuning, no new breaking changes.
|
||||
- **Claude Opus 4.7** — Previous-generation Opus. Highly autonomous; strong on long-horizon agentic work, knowledge work, vision, and memory. Adaptive thinking only; sampling parameters and `budget_tokens` removed. 1M context window. See `shared/model-migration.md` → Migrating to Opus 4.7.
|
||||
- **Claude Opus 4.6** — Older Opus. Supports adaptive thinking (recommended), 128K max output tokens (requires streaming for large outputs). 1M context window.
|
||||
- **Claude Sonnet 4.6** — Our best combination of speed and intelligence. Supports adaptive thinking (recommended). 1M context window. 64K max output tokens.
|
||||
@@ -75,7 +79,7 @@ curl https://api.anthropic.com/v1/models/claude-opus-4-8 \
|
||||
| Friendly Name | Alias (use this) | Full ID | Status |
|
||||
|-------------------|---------------------|-------------------------------|--------|
|
||||
| Claude Opus 4.5 | `claude-opus-4-5` | `claude-opus-4-5-20251101` | Active |
|
||||
| Claude Opus 4.1 | `claude-opus-4-1` | `claude-opus-4-1-20250805` | Active |
|
||||
| Claude Opus 4.1 | `claude-opus-4-1` | `claude-opus-4-1-20250805` | Deprecated (retires 2026-08-05 — migrate to `claude-opus-4-8`) |
|
||||
| Claude Sonnet 4.5 | `claude-sonnet-4-5` | `claude-sonnet-4-5-20250929` | Active |
|
||||
|
||||
## Deprecated Models (retiring soon)
|
||||
@@ -105,12 +109,16 @@ When a user asks for a model by name, use this table to find the correct model I
|
||||
|
||||
| User says... | Use this model ID |
|
||||
|-------------------------------------------|--------------------------------|
|
||||
| "opus", "most powerful" | `claude-opus-4-8` |
|
||||
| "fable", "most capable model" | `claude-fable-5` |
|
||||
| "most powerful" | `claude-fable-5` |
|
||||
| "mythos", "mythos 5" | `claude-mythos-5` (Project Glasswing participants only; otherwise use `claude-fable-5`) |
|
||||
| "mythos preview" | `claude-mythos-5` (successor to `claude-mythos-preview` — see migration guide) |
|
||||
| "opus" | `claude-opus-4-8` |
|
||||
| "opus 4.8" | `claude-opus-4-8` |
|
||||
| "opus 4.7" | `claude-opus-4-7` |
|
||||
| "opus 4.6" | `claude-opus-4-6` |
|
||||
| "opus 4.5" | `claude-opus-4-5` |
|
||||
| "opus 4.1" | `claude-opus-4-1` |
|
||||
| "opus 4.1" | `claude-opus-4-1` (deprecated, retires 2026-08-05 — suggest `claude-opus-4-8`) |
|
||||
| "opus 4", "opus 4.0" | `claude-opus-4-0` (deprecated — suggest `claude-opus-4-8`) |
|
||||
| "sonnet", "balanced" | `claude-sonnet-4-6` |
|
||||
| "sonnet 4.6" | `claude-sonnet-4-6` |
|
||||
|
||||
@@ -130,10 +130,10 @@ Fix by moving the dynamic piece after the last breakpoint, making it determinist
|
||||
| Model | Minimum |
|
||||
|---|---:|
|
||||
| Opus 4.8, Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5 | 4096 tokens |
|
||||
| Sonnet 4.6, Haiku 3.5, Haiku 3 | 2048 tokens |
|
||||
| Fable 5, Sonnet 4.6, Haiku 3.5, Haiku 3 | 2048 tokens |
|
||||
| Sonnet 4.5, Sonnet 4.1, Sonnet 4, Sonnet 3.7 | 1024 tokens |
|
||||
|
||||
A 3K-token prompt caches on Sonnet 4.5 but silently won't on Opus 4.8.
|
||||
A 3K-token prompt caches on Sonnet 4.5 and Fable 5 but silently won't on Opus 4.8.
|
||||
|
||||
**Economics:** Cache reads cost ~0.1× base input price. Cache writes cost **1.25× for 5-minute TTL, 2× for 1-hour TTL**. Break-even depends on TTL: with 5-minute TTL, two requests break even (1.25× + 0.1× = 1.35× vs 2× uncached); with 1-hour TTL, you need at least three requests (2× + 0.2× = 2.2× vs 3× uncached). The 1-hour TTL keeps entries alive across gaps in bursty traffic, but the doubled write cost means it needs more reads to pay off.
|
||||
|
||||
|
||||
@@ -171,7 +171,7 @@ Web search and web fetch let Claude search the web and retrieve page content. Th
|
||||
]
|
||||
```
|
||||
|
||||
### Dynamic Filtering (Opus 4.8 / Opus 4.7 / Opus 4.6 / Sonnet 4.6)
|
||||
### Dynamic Filtering (Fable 5 / Opus 4.8 / Opus 4.7 / Opus 4.6 / Sonnet 4.6)
|
||||
|
||||
The `web_search_20260209` and `web_fetch_20260209` versions support **dynamic filtering** — Claude writes and executes code to filter search results before they reach the context window, improving accuracy and token efficiency. Dynamic filtering is built into these tool versions and activates automatically; you do not need to separately declare the `code_execution` tool or pass any beta header.
|
||||
|
||||
@@ -300,7 +300,7 @@ Two features are available:
|
||||
- **JSON outputs** (`output_config.format`): Control Claude's response format
|
||||
- **Strict tool use** (`strict: true`): Guarantee valid tool parameter schemas
|
||||
|
||||
**Supported models:** Claude Opus 4.8, Claude Sonnet 4.6, and Claude Haiku 4.5. Legacy models (Claude Opus 4.5, Claude Opus 4.1) also support structured outputs.
|
||||
**Supported models:** Claude Fable 5, Claude Opus 4.8, Claude Sonnet 4.6, and Claude Haiku 4.5. Legacy models (Claude Opus 4.5, Claude Opus 4.1) also support structured outputs.
|
||||
|
||||
> **Recommended:** Use `client.messages.parse()` which automatically validates responses against your schema. When using `messages.create()` directly, use `output_config: {format: {...}}`. The `output_format` convenience parameter is also accepted by some SDK methods (e.g., `.parse()`), but `output_config.format` is the canonical API-level parameter.
|
||||
|
||||
|
||||
@@ -196,11 +196,11 @@ If `cache_read_input_tokens` is zero across repeated identical-prefix requests,
|
||||
|
||||
## Extended Thinking
|
||||
|
||||
> **Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.8 and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
|
||||
> **Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Fable 5, Opus 4.8, and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
|
||||
> **Older models:** Use `thinking: {type: "enabled", budget_tokens: N}` (must be < `max_tokens`, min 1024).
|
||||
|
||||
```typescript
|
||||
// Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
|
||||
// Fable 5 / Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
|
||||
const response = await client.messages.create({
|
||||
model: "claude-opus-4-8",
|
||||
max_tokens: 16000,
|
||||
@@ -276,7 +276,7 @@ const response = await client.messages.create({
|
||||
|
||||
### Compaction (long conversations)
|
||||
|
||||
> **Beta, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.
|
||||
> **Beta, Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.
|
||||
|
||||
```typescript
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
@@ -23,7 +23,7 @@ for await (const event of stream) {
|
||||
|
||||
## Handling Different Content Types
|
||||
|
||||
> **Opus 4.8 / Opus 4.7 / Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.
|
||||
> **Fable 5 / Opus 4.8 / Opus 4.7 / Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.
|
||||
|
||||
```typescript
|
||||
const stream = client.messages.stream({
|
||||
|
||||
@@ -1,42 +1,55 @@
|
||||
---
|
||||
name: frontend-design
|
||||
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
|
||||
description: Guidance for distinctive, intentional visual design when building new UI or reshaping an existing one. Helps with aesthetic direction, typography, and making choices that don't read as templated defaults.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
|
||||
# Frontend Design
|
||||
|
||||
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
|
||||
Approach this as the design lead at a small studio known for giving every client a visual identity that could not be mistaken for anyone else's. This client has already rejected proposals that felt templated, and is paying for a distinctive point of view: make deliberate, opinionated choices about palette, typography, and layout that are specific to this brief, and take one real aesthetic risk you can justify.
|
||||
|
||||
## Design Thinking
|
||||
## Ground it in the subject
|
||||
|
||||
Before coding, understand the context and commit to a BOLD aesthetic direction:
|
||||
- **Purpose**: What problem does this interface solve? Who uses it?
|
||||
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
|
||||
- **Constraints**: Technical requirements (framework, performance, accessibility).
|
||||
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
|
||||
If the brief does not pin down what the product or subject is, pin it yourself before designing: name one concrete subject, its audience, and the page's single job, and state your choice. If there's any information in your memory about the human's preferences, context about what they're building, or designs you've made before – use that as a hint. The subject's own world, its materials, instruments, artifacts, and vernacular, is where distinctive choices come from. Build with the brief's real content and subject matter throughout.
|
||||
|
||||
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
|
||||
## Design principles
|
||||
|
||||
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
|
||||
- Production-grade and functional
|
||||
- Visually striking and memorable
|
||||
- Cohesive with a clear aesthetic point-of-view
|
||||
- Meticulously refined in every detail
|
||||
For web designs, the hero is a thesis. Open with the most characteristic thing in the subject's world, in whatever form makes sense for it: a headline, an image, an animation, a live demo, an interactive moment. Be deliberate with your choice: a big number with a small label, supporting stats, and a gradient accent is the template answer, only use if that's truly the best option.
|
||||
|
||||
## Frontend Aesthetics Guidelines
|
||||
Typography carries the personality of the page. Pair the display and body faces deliberately, not the same families you would reach for on any other project, and set a clear type scale with intentional weights, widths, and spacing. Make the type treatment itself a memorable part of the design, not a neutral delivery vehicle for the content.
|
||||
|
||||
Focus on:
|
||||
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
|
||||
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
|
||||
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
|
||||
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
|
||||
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
|
||||
Structure is information. Structural devices, numbering, eyebrows, dividers, labels, should encode something true about the content, not decorate it. Many generic designs use numbered markers (01 / 02 / 03), but that's only appropriate if the content actually is a sequence - like a real process or a typed timeline where order carries information the reader needs. Question if choices like numbered markers actually make sense before incorporating them.
|
||||
|
||||
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
|
||||
Leverage motion deliberately. Think about where and if animation can serve the subject: a page-load sequence, a scroll-triggered reveal, hover micro-interactions, ambient atmosphere. An orchestrated moment usually lands harder than scattered effects; choose what the direction calls for. However, sometimes less is more, and extra animation contributes to the feeling that the design is AI-generated.
|
||||
|
||||
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
|
||||
Match complexity to the vision. Maximalist directions need elaborate execution; minimal directions need precision in spacing, type, and detail. Elegance is executing the chosen vision well.
|
||||
|
||||
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
|
||||
Consider written content carefully. Often a design brief may not contain real content, and it's up to you to come up with copy. Copy can make a design feel as templated as the design itself. See the below section on writing for more guidance.
|
||||
|
||||
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
|
||||
## Process: brainstorm, explore, plan, critique, build, critique again
|
||||
|
||||
For calibration: AI-generated design right now clusters around three looks: (1) a warm cream background (near #F4F1EA) with a high-contrast serif display and a terracotta accent; (2) a near-black background with a single bright acid-green or vermilion accent; (3) a broadsheet-style layout with hairline rules, zero border-radius, and dense newspaper-like columns. All three are legitimate for some briefs, but they are defaults rather than choices, and they appear regardless of subject. Where the brief pins down a visual direction, follow it exactly — the brief's own words always win, including when it asks for one of these looks. Where it leaves an axis free, don't spend that freedom on one of these defaults. Just like a human designer who's hired, there's often a careful balance between doing what you're good at and taking each project as a chance to experiment and learn.
|
||||
|
||||
Work in two passes. First, brainstorm a short design plan based on the human's design brief: create a compact token system with color, type, layout, and signature. Color: describe the palette as 4–6 named hex values. Type: the typefaces for 2+ roles (a characterful display face that's used with restraint, a complementary body face, and a utility face for captions or data if needed). Layout: a layout concept, using one-sentence prose descriptions and ASCII wireframes to ideate and compare. Signature: the single unique element this page will be remembered by that embodies the brief in an appropriate way.
|
||||
|
||||
Then review that plan against the brief before building: if any part of it reads like the generic default you would produce for any similar page (work through a similar prompt to see if you arrive somewhere similar) rather than a choice made for this specific brief — revise that part, say what you changed and why. Only after you've confirmed the relative uniqueness of your design plan should you start to write the code, following the revised plan exactly and deriving every color and type decision from it.
|
||||
|
||||
When writing the code, be careful of structuring your CSS selector specificities. It's easy to generate CSS classes that cancel each other out (especially with a type-based selector like .section and a element-based selector like .cta). This can happen often with paddings/margins between sections.
|
||||
|
||||
Try to do a lot of this planning and iteration in your thinking, and only show ideas to the user when you have higher confidence it'll delight them.
|
||||
|
||||
## Restraint and self-critique
|
||||
|
||||
Spend your boldness in one place. Let the signature element be the one memorable thing, keep everything around it quiet and disciplined, and cut any decoration that does not serve the brief. Not taking a risk can be a risk itself! Build to a quality floor without announcing it: responsive down to mobile, visible keyboard focus, reduced motion respected. Critique your own work as you build, taking screenshots if your environment supports it – a picture is worth 1000 tokens. Consider Chanel's advice: before leaving the house, take a look in the mirror and remove one accessory. Human creators have memory and always try to do something new, so if you have a space to quickly jot down notes about what you've tried, it can help you in future passes.
|
||||
|
||||
## More on writing in design
|
||||
|
||||
Words appear in a design for one reason: to make it easier to understand, and therefore easier to use. They are design material, not decoration. Bring the same intentionality to copy that you would bring to spacing and color. Before writing anything, ask what the design needs to say, and how it can best be said to help the person navigate the experience.
|
||||
|
||||
Write from the end user's side of the screen. Name things by what people control and recognize, never by how the system is built. A person manages notifications, not webhook config. Describe what something does in plain terms rather than selling it. Being specific is always better than being clever.
|
||||
|
||||
Use active voice as default. A control should say exactly what happens when it's used: "Save changes," not "Submit." An action keeps the same name through the whole flow, so the button that says "Publish" produces a toast that says "Published." The vocabulary of an interface is the signposting for someone navigating the product. Cohesion and consistency are how people learn their way around.
|
||||
|
||||
Treat failure and emptiness as moments for direction, not mood. Explain what went wrong and how to fix it, in the interface's voice rather than a person's. Errors don't apologize, and they are never vague about what happened. An empty screen is an invitation to act.
|
||||
|
||||
Keep the register conversational and tuned: plain verbs, sentence case, no filler, with tone matched to the brand and the audience. Let each element do exactly one job. A label labels, an example demonstrates, and nothing quietly does double duty.
|
||||
|
||||
Reference in New Issue
Block a user