Update claude-api skill: auth, cloud providers, Managed Agents fixes, token counting (#1276)

* Sync claude-api skill with latest upstream updates

- Add token-counting.md and SKILL.md trigger description update
- Add auth guidance: env credential resolution, ant auth login, OAuth/WIF doc links, 401 causes
- Add mid-conversation system messages (beta) to prompt-caching, agent-design, SKILL.md, Python/TS READMEs
- Add cache pre-warming (max_tokens: 0) section to prompt-caching
- Add Managed Agents pre-flight viability check to onboarding and overview
- Add Bedrock model-ID section to model-migration; add Bedrock row to live-sources
- Add /claude-api migrate subcommand row and migrate-entry callout
- Fix MA networking config: limited type with allow_package_managers/allow_mcp_servers
- Bump MA create-operations rate limit to 300 RPM
- Fix MA SDK drift: sessions.events.stream(), event.name, typed event arrays
- Add SDK coverage: stop_details, error .type, C# tool runner + MA support, Go model constants, Java 2.34.0, client config, response helpers, auto-pagination, advisor tool
- Move Sonnet 4 / Opus 4 to deprecated in models.md

* Add Anthropic CLI and Claude Platform on AWS docs to claude-api skill

- Add shared/anthropic-cli.md: install, auth profiles, OAuth scopes, command
  structure, version-controlled Managed Agents resources, credential traps
- Add shared/claude-platform-on-aws.md: AnthropicAWS clients, SigV4 auth,
  workspace_id, regions, feature availability
- Restore cross-references to both files throughout SKILL.md and the
  managed-agents docs (previously rewritten to live-sources.md pointers)
- Restore Claude Platform on AWS provider taxonomy in SKILL.md, the
  migration-guide section, and live-sources rows
This commit is contained in:
Lance Martin
2026-06-07 13:21:33 -07:00
committed by GitHub
parent da20c92503
commit c30d329f58
30 changed files with 960 additions and 83 deletions

View File

@@ -1,6 +1,9 @@
---
name: claude-api
description: "Build, debug, and optimize Claude API / Anthropic SDK apps. Apps built with this skill should include prompt caching. Also handles migrating existing Claude API code between Claude model versions (4.5 → 4.6, 4.6 → 4.7, retired-model replacements). TRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`; user asks for the Claude API, Anthropic SDK, or Managed Agents; user adds/modifies/tunes a Claude feature (caching, thinking, compaction, tool use, batch, files, citations, memory) or model (Opus/Sonnet/Haiku) in a file; questions about prompt caching / cache hit rate in an Anthropic SDK project. SKIP: file imports `openai`/other-provider SDK, filename like `*-openai.py`/`*-generic.py`, provider-neutral code, general programming/ML."
description: |-
Reference for the Claude API / Anthropic SDK — model ids, pricing, params, streaming, tool use, MCP, agents, caching, token counting, model migration.
TRIGGER — read BEFORE opening the target file; don't skip because it "looks like a one-liner" — whenever: the prompt names Claude/Anthropic in any form (Claude, Anthropic, Opus, Sonnet, Haiku, `anthropic`, `@anthropic-ai`, `claude-*`, `us.anthropic.*`, `[1m]`); the user asks about an LLM (pricing/model choice/limits/caching) — never answer from memory; OR the task is LLM-shaped with provider unstated (agent/MCP/tool-definition/multi-agent/RAG/LLM-judge/computer-use; generate/summarize/extract/classify/rewrite/converse over NL; debugging refusals/cutoffs/streaming/tool-calls/tokens).
SKIP only when another provider is being worked on (overrides all triggers): OpenAI/GPT/Gemini/Llama/Mistral/Cohere/Ollama named in the query; OR `grep -rE 'openai|langchain_openai|google.generativeai|genai|mistralai|cohere|ollama'` over the project hits (run this grep FIRST if no provider named — don't Read the file).
license: Complete terms in LICENSE.txt
---
@@ -35,6 +38,9 @@ For the Claude model version, please use Claude Opus 4.8, which you can access v
If the User Request at the bottom of this prompt is a bare subcommand string (no prose), search every **Subcommands** table in this document — including any in sections appended below — and follow the matching Action column directly. This lets users invoke specific flows via `/claude-api <subcommand>`. If no table in the document matches, treat the request as normal prose.
| Subcommand | Action |
|---|---|
| `migrate` | Migrate existing Claude API code to a newer model. **Read `shared/model-migration.md` immediately** and follow it in order: Step 0 (confirm scope — ask which files/directories before any edit), Step 1 (classify each file), then the per-target breaking-changes section. Do not summarize the guide — execute it. If the user did not name a target model, ask which model to migrate to in the same turn as the scope question. |
---
@@ -81,11 +87,11 @@ Before reading code examples, determine which language the user is working in:
| Java | Yes (beta) | Yes (beta) | Beta tool use with annotated classes |
| Go | Yes (beta) | Yes (beta) | `BetaToolRunner` in `toolrunner` pkg |
| Ruby | Yes (beta) | Yes (beta) | `BaseTool` + `tool_runner` in beta |
| C# | No | No | Official SDK |
| C# | Yes (beta) | Yes (beta) | `BetaToolRunner` + raw JSON schema |
| PHP | Yes (beta) | Yes (beta) | `BetaRunnableTool` + `toolRunner()` |
| cURL | N/A | Yes (beta) | Raw HTTP, no SDK features |
> **Managed Agents code examples**: dedicated language-specific READMEs are provided for Python, TypeScript, Go, Ruby, PHP, Java, and cURL (`{lang}/managed-agents/README.md`, `curl/managed-agents.md`). Read your language's README plus the language-agnostic `shared/managed-agents-*.md` concept files. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML — its URL is in `shared/live-sources.md`. If a binding you need isn't shown in the README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# does not currently have Managed Agents support; use cURL-style raw HTTP requests against the API.
> **Managed Agents code examples**: dedicated language-specific READMEs are provided for Python, TypeScript, Go, Ruby, PHP, Java, and cURL (`{lang}/managed-agents/README.md`, `curl/managed-agents.md`). Read your language's README plus the language-agnostic `shared/managed-agents-*.md` concept files. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.
---
@@ -105,16 +111,16 @@ Before reading code examples, determine which language the user is working in:
> **Note:** Managed Agents is the right choice when you want Anthropic to run the agent loop *and* host the container where tools execute — file ops, bash, code execution all run in the per-session workspace. If you want to host the compute yourself or run your own custom tool runtime, Claude API + tool use is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).
> **Third-party providers (Amazon Bedrock, Google Vertex AI, Microsoft Foundry):** Managed Agents is **not available** on Bedrock, Vertex, or Foundry. If you are deploying through any third-party provider, use **Claude API + tool use** for all use cases — including ones where Managed Agents would otherwise be the recommended surface.
> **Cloud-provider access.** **Claude Platform on AWS** is Anthropic-operated with same-day API parity — Managed Agents and every feature in this skill work there, **except self-hosted sandboxes** (see `shared/claude-platform-on-aws.md`). **Amazon Bedrock**, **Google Vertex AI**, and **Microsoft Foundry** do **not** support Managed Agents or Anthropic server-side tools; use **Claude API + tool use** on those.
### Decision Tree
```
What does your application need?
0. Are you deploying through Amazon Bedrock, Google Vertex AI, or Microsoft Foundry?
── Yes → Claude API (+ tool use for agents) — Managed Agents is 1P only.
No → continue.
0. Which provider?
── First-party API or Claude Platform on AWS → continue (full surface available).
└── Amazon Bedrock, Google Vertex AI, or Microsoft Foundry → Claude API (+ tool use for agents); Managed Agents not available there.
1. Single LLM call (classification, summarization, extraction, Q&A)
└── Claude API — one request, one response
@@ -157,7 +163,7 @@ Everything goes through `POST /v1/messages`. Tools and output constraints are fe
**Structured outputs** — Constrains the Messages API response format (`output_config.format`) and/or tool parameter validation (`strict: true`). The recommended approach is `client.messages.parse()` which validates responses against your schema automatically. Note: the old `output_format` parameter is deprecated; use `output_config: {format: {...}}` on `messages.create()`.
**Supporting endpoints** — Batches (`POST /v1/messages/batches`), Files (`POST /v1/files`), Token Counting, and Models (`GET /v1/models`, `GET /v1/models/{id}` — live capability/context-window discovery) feed into or support Messages API requests.
**Supporting endpoints** — Batches (`POST /v1/messages/batches`), Files (`POST /v1/files`), Token Counting (`POST /v1/messages/count_tokens` — see `shared/token-counting.md`), and Models (`GET /v1/models`, `GET /v1/models/{id}` — live capability/context-window discovery) feed into or support Messages API requests.
---
@@ -173,7 +179,7 @@ Everything goes through `POST /v1/messages`. Tools and output constraints are fe
**ALWAYS use `claude-opus-4-8` unless the user explicitly names a different model.** This is non-negotiable. Do not use `claude-sonnet-4-6`, `claude-sonnet-4-5`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.
**CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-5`, never `claude-sonnet-4-5-20250514` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read `shared/models.md` for the exact ID — do not construct one yourself.
**CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-6`, never `claude-sonnet-4-6-20251114` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read `shared/models.md` for the exact ID — do not construct one yourself.
A note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.
@@ -211,6 +217,8 @@ See `{lang}/claude-api/README.md` (Compaction section) for code examples. Full d
**Prefix match.** Any byte change anywhere in the prefix invalidates everything after it. Render order is `tools``system``messages`. Keep stable content first (frozen system prompt, deterministic tool list), put volatile content (timestamps, per-request IDs, varying questions) after the last `cache_control` breakpoint.
**Mid-conversation operator instructions** (beta header `mid-conversation-system-2026-04-07`, on supporting models): append `{"role": "system", ...}` to `messages[]` instead of editing top-level `system`. Preserves the cached history prefix and is the prompt-injection-safe operator channel. See `shared/prompt-caching.md` § Mid-conversation system messages.
**Top-level auto-caching** (`cache_control: {type: "ephemeral"}` on `messages.create()`) is the simplest option when you don't need fine-grained placement. Max 4 breakpoints per request. Minimum cacheable prefix is ~1024 tokens — shorter prefixes silently won't cache.
**Verify with `usage.cache_read_input_tokens`** — if it's zero across repeated requests, a silent invalidator is at work (`datetime.now()` in system prompt, unsorted JSON, varying tool set).
@@ -223,7 +231,7 @@ For placement patterns, architectural guidance, and the silent-invalidator audit
**Managed Agents** is a third surface: server-managed stateful agents with Anthropic-hosted tool execution. You create a persisted, versioned Agent config (`POST /v1/agents`), then start Sessions that reference it. Each session provisions a container as the agent's workspace — bash, file ops, and code execution run there; the agent loop itself runs on Anthropic's orchestration layer and acts on the container via tools. The session streams events; you send messages and tool results back.
**Managed Agents is first-party only.** It is not available on Amazon Bedrock, Google Vertex AI, or Microsoft Foundry. For agents on third-party providers, use Claude API + tool use.
**Managed Agents is available on the first-party API and Claude Platform on AWS.** It is **not** available on Amazon Bedrock, Google Vertex AI, or Microsoft Foundry — for agents there, use Claude API + tool use.
**Mandatory flow:** Agent (once) → Session (every run). `model`/`system`/`tools` live on the agent, never the session. See `shared/managed-agents-overview.md` for the full reading guide, beta headers, and pitfalls.
@@ -233,9 +241,9 @@ For placement patterns, architectural guidance, and the silent-invalidator audit
| Subcommand | Action |
|---|---|
| `managed-agents-onboard` | Walk the user through setting up a Managed Agent from scratch. **Read `shared/managed-agents-onboarding.md` immediately** and follow its interview script: mental model → know-or-explore branch → template config → session setup → emit code. Do not summarize — run the interview. |
| `managed-agents-onboard` | Walk the user through setting up a Managed Agent from scratch. **Read `shared/managed-agents-onboarding.md` immediately** and follow its interview script: mental model → know-or-explore branch → template config → session setup → **pre-flight viability check** → emit code. The viability check (reconcile the stated job against configured tools/credentials/data) catches under-resourced setups — missing a tool, credential, or data access — before the agent burns budget. Do not summarize — run the interview. |
**Reading guide:** Start with `shared/managed-agents-overview.md`, then the topical `shared/managed-agents-*.md` files (core, environments, tools, events, outcomes, multiagent, webhooks, memory, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML (URL in `shared/live-sources.md`). If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# does not currently have Managed Agents support; use raw HTTP from `curl/managed-agents.md` as a reference.
**Reading guide:** Start with `shared/managed-agents-overview.md`, then the topical `shared/managed-agents-*.md` files (core, environments, tools, events, outcomes, multiagent, webhooks, memory, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.
**When the user wants to set up a Managed Agent from scratch** (e.g. "how do I get started", "walk me through creating one", "set up a new agent"): read `shared/managed-agents-onboarding.md` and run its interview — same flow as the `managed-agents-onboard` subcommand.
@@ -261,6 +269,8 @@ After detecting the language, read the relevant files based on what the user nee
→ Read `shared/model-migration.md`
**Prompt caching / optimize caching / "why is my cache hit rate low":**
→ Read `shared/prompt-caching.md` + `{lang}/claude-api/README.md` (Prompt Caching section)
**Count tokens in a file / prompt / diff ("how many tokens is X"):**
→ Read `shared/token-counting.md` — use `messages.count_tokens`, never `tiktoken`
**Function calling / tool use / agents:**
→ Read `{lang}/claude-api/README.md` + `shared/tool-use-concepts.md` + `{lang}/claude-api/tool-use.md`
@@ -275,7 +285,7 @@ After detecting the language, read the relevant files based on what the user nee
→ Read `{lang}/claude-api/README.md` + `{lang}/claude-api/files-api.md`
**Managed Agents (server-managed stateful agents with workspace):**
→ Read `shared/managed-agents-overview.md` + the rest of the `shared/managed-agents-*.md` files. For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML (URL in `shared/live-sources.md`). If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# does not currently support Managed Agents — use raw HTTP from `curl/managed-agents.md` as a reference.
→ Read `shared/managed-agents-overview.md` + the rest of the `shared/managed-agents-*.md` files. For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support — see `csharp/claude-api.md` for details, or `curl/managed-agents.md` for raw HTTP reference.
### Claude API (Full File Reference)
@@ -316,7 +326,7 @@ Live documentation URLs are in `shared/live-sources.md`.
- **Opus 4.6 / Sonnet 4.6 thinking:** Use `thinking: {type: "adaptive"}` — do NOT use `budget_tokens` for new 4.6 code (deprecated on both Opus 4.6 and Sonnet 4.6; for gradual migration of existing code, see the transitional escape hatch in `shared/model-migration.md` — note this carve-out does not apply to Opus 4.7 or 4.8). For older models, `budget_tokens` must be less than `max_tokens` (minimum 1024). This will throw an error if you get it wrong.
- **4.6/4.7/4.8 family prefill removed:** Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6. Use structured outputs (`output_config.format`) or system prompt instructions to control response format instead.
- **Confirm migration scope before editing:** When a user asks to migrate code to a newer Claude model without naming a specific file, directory, or file list, **ask which scope to apply first** — the entire working directory, a specific subdirectory, or a specific set of files. Do not start editing until the user confirms. Imperative phrasings like "migrate my codebase", "move my project to X", "upgrade to Sonnet 4.6", or bare "migrate to Opus 4.8" are **still ambiguous** — they tell you what to do but not where, so ask. Proceed without asking only when the prompt names an exact file, a specific directory, or an explicit file list ("migrate `app.py`", "migrate everything under `services/`", "update `a.py` and `b.py`"). See `shared/model-migration.md` Step 0.
- **`max_tokens` defaults:** Don't lowball `max_tokens` — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to `~16000` (keeps responses under SDK HTTP timeouts). For streaming requests, default to `~64000` (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (`~256`), cost caps, or deliberately short outputs.
- **`max_tokens` defaults:** Don't lowball `max_tokens` — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to `~16000` (keeps responses under SDK HTTP timeouts). For streaming requests, default to `~64000` (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (`~256`), cost caps, deliberately short outputs, or **`max_tokens: 0`** for cache pre-warming (see `shared/prompt-caching.md` → Pre-warming).
- **128K output tokens:** Opus 4.6, Opus 4.7, and Opus 4.8 support up to 128K `max_tokens`, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use `.stream()` with `.get_final_message()` / `.finalMessage()`.
- **Tool call JSON parsing (4.6/4.7/4.8 family):** Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6 may produce different JSON string escaping in tool call `input` fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with `json.loads()` / `JSON.parse()` — never do raw string matching on the serialized input.
- **Structured outputs (all models):** Use `output_config: {format: {...}}` instead of the deprecated `output_format` parameter on `messages.create()`. This is a general API change, not 4.6-specific.

View File

@@ -1,6 +1,6 @@
# Claude API — C#
> **Note:** The C# SDK is the official Anthropic SDK for C#. Tool use is supported via the Messages API. A class-annotation-based tool runner is not available; use raw tool definitions with JSON schema. The SDK also supports Microsoft.Extensions.AI IChatClient integration with function invocation.
> **Note:** The C# SDK is the official Anthropic SDK for C#. Tool use is supported via the Messages API with a beta `BetaToolRunner` for automatic tool execution loops. The SDK also supports Microsoft.Extensions.AI IChatClient integration with function invocation and Managed Agents (beta).
## Installation
@@ -400,3 +400,48 @@ new BetaRequestDocumentBlock {
```
The non-beta `DocumentBlockParamSource` union has no file-ID variant — file references need `client.Beta.Messages.Create()`.
---
## Tool Runner (Beta)
The C# SDK provides a `BetaToolRunner` for automatic tool execution loops. Define tools with raw JSON schemas, and the runner handles the API call → tool execution → result feedback loop.
```csharp
using Anthropic.Models.Beta.Messages;
// Define tools and create params as shown in the Tool Use section above,
// but using the beta namespace types (BetaToolUnion, etc.)
var runner = client.Beta.Messages.ToolRunner(betaParams);
await foreach (BetaMessage message in runner)
{
foreach (var block in message.Content)
{
if (block.TryPickText(out var text))
{
Console.WriteLine(text.Text);
}
}
}
```
---
## Stop Details
When `StopReason` is `"refusal"`, the response includes structured `StopDetails`:
```csharp
if (response.StopReason == "refusal" && response.StopDetails is { } details)
{
Console.WriteLine($"Category: {details.Category}");
Console.WriteLine($"Explanation: {details.Explanation}");
}
```
---
## Managed Agents (Beta)
The C# SDK supports Managed Agents via `client.Beta.Agents`, `client.Beta.Sessions`, `client.Beta.Environments`, and related namespaces. See `shared/managed-agents-overview.md` for the architecture and `curl/managed-agents.md` for the wire-level reference.

View File

@@ -42,7 +42,9 @@ curl -X POST https://api.anthropic.com/v1/environments \
"config": {
"type": "cloud",
"networking": {
"type": "package_managers_and_custom",
"type": "limited",
"allow_package_managers": true,
"allow_mcp_servers": true,
"allowed_hosts": ["api.example.com"]
}
}

View File

@@ -27,11 +27,17 @@ client := anthropic.NewClient(
---
## Model Constants
The Go SDK provides typed model constants: `anthropic.ModelClaudeOpus4_8`, `anthropic.ModelClaudeOpus4_7`, `anthropic.ModelClaudeSonnet4_6`, `anthropic.ModelClaudeHaiku4_5_20251001`. Use `ModelClaudeOpus4_8` unless the user specifies otherwise.
---
## Basic Message Request
```go
response, err := client.Messages.New(context.Background(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeOpus4_6,
Model: anthropic.ModelClaudeOpus4_8,
MaxTokens: 16000,
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("What is the capital of France?")),
@@ -278,12 +284,12 @@ Enable Claude's internal reasoning by setting `Thinking` in `MessageNewParams`.
**Adaptive thinking is the recommended mode for Claude 4.6+ models.** Claude decides dynamically when and how much to think. Combine with the `effort` parameter for cost-quality control.
Derived from `anthropic-sdk-go/message.go` (`ThinkingConfigParamUnion`, `NewThinkingConfigAdaptiveParam`).
Derived from `anthropic-sdk-go/message.go` (`ThinkingConfigParamUnion`, `ThinkingConfigAdaptiveParam`).
```go
// There is no ThinkingConfigParamOfAdaptive helper — construct the union
// struct-literal directly and take the address of the variant.
adaptive := anthropic.NewThinkingConfigAdaptiveParam()
adaptive := anthropic.ThinkingConfigAdaptiveParam{}
params := anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 16000,
@@ -345,7 +351,20 @@ Tools: []anthropic.ToolUnionParam{
},
```
Also available: `WebFetchTool20260209Param`, `MemoryTool20250818Param`, `ToolSearchToolBm25_20251119Param`, `ToolSearchToolRegex20251119Param`.
Also available: `WebFetchTool20260209Param`, `MemoryTool20250818Param`, `ToolSearchToolBm25_20251119Param`, `ToolSearchToolRegex20251119Param`. For the advisor tool, use `BetaAdvisorTool20260301Param` in the beta namespace.
---
## Stop Details
When `StopReason` is `anthropic.StopReasonRefusal`, the response includes structured `StopDetails`:
```go
if resp.StopReason == anthropic.StopReasonRefusal {
fmt.Println("Category:", resp.StopDetails.Category) // "cyber" | "bio" | ""
fmt.Println("Explanation:", resp.StopDetails.Explanation)
}
```
---

View File

@@ -10,14 +10,14 @@ Maven:
<dependency>
<groupId>com.anthropic</groupId>
<artifactId>anthropic-java</artifactId>
<version>2.17.0</version>
<version>2.34.0</version>
</dependency>
```
Gradle:
```groovy
implementation("com.anthropic:anthropic-java:2.17.0")
implementation("com.anthropic:anthropic-java:2.34.0")
```
## Client Initialization
@@ -359,7 +359,7 @@ import com.anthropic.models.messages.CodeExecutionTool20260120;
.addTool(CodeExecutionTool20260120.builder().build())
```
Also available: `WebFetchTool20260209`, `MemoryTool20250818`, `ToolSearchToolBm25_20251119`.
Also available: `WebFetchTool20260209`, `MemoryTool20250818`, `ToolSearchToolBm25_20251119`. For the advisor tool, use `BetaAdvisorTool20260301` in the beta namespace.
### Beta namespace (MCP, compaction)
@@ -408,6 +408,35 @@ for (ContentBlock block : response.content()) {
---
## Stop Details
When `stopReason()` is `"refusal"`, the response includes structured `stopDetails()`:
```java
response.stopDetails().ifPresent(details -> {
System.out.println("Category: " + details.category());
System.out.println("Explanation: " + details.explanation());
});
```
---
## Error Type
`AnthropicServiceException` exposes `.errorType()` returning `Optional<ErrorType>` for programmatic error classification:
```java
try {
client.messages().create(params);
} catch (AnthropicServiceException e) {
e.errorType().ifPresent(type ->
System.out.println("Error type: " + type) // RATE_LIMIT_ERROR, OVERLOADED_ERROR, etc.
);
}
```
---
## Files API (Beta)
Under `client.beta().files()`. File references in messages need the beta message types (non-beta `DocumentBlockParam.Source` has no file-ID variant).

View File

@@ -373,3 +373,30 @@ $response = $client->beta->messages->create(
```
**Server-side tools** (bash, web_search, text_editor, code_execution) are GA and work on both paths — `Anthropic\Messages\ToolBash20250124` / `WebSearchTool20260209` / `ToolTextEditor20250728` / `CodeExecutionTool20260120` for non-beta, `Anthropic\Beta\Messages\BetaToolBash20250124` / `BetaWebSearchTool20260209` / `BetaToolTextEditor20250728` / `BetaCodeExecutionTool20260120` for beta. No `betas:` header needed for these.
---
## Stop Details
When `stopReason` is `'refusal'`, the response includes structured `stopDetails`:
```php
if ($message->stopReason === 'refusal' && $message->stopDetails !== null) {
echo "Category: " . $message->stopDetails->category . "\n"; // "cyber" | "bio" | null
echo "Explanation: " . $message->stopDetails->explanation . "\n";
}
```
---
## Error Type
`APIStatusException` exposes a `->type` property for programmatic error classification:
```php
try {
$client->messages->create(...);
} catch (\Anthropic\Core\Exceptions\APIStatusException $e) {
echo $e->type?->value; // "rate_limit_error", "overloaded_error", etc.
}
```

View File

@@ -11,10 +11,12 @@ pip install anthropic
```python
import anthropic
# Default (uses ANTHROPIC_API_KEY env var)
# Default — resolves credentials from the environment:
# ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.
# Prefer this for local dev; don't hardcode a key.
client = anthropic.Anthropic()
# Explicit API key
# Explicit API key (only when you must inject a specific key)
client = anthropic.Anthropic(api_key="your-api-key")
# Async client
@@ -23,6 +25,67 @@ async_client = anthropic.AsyncAnthropic()
---
## Client Configuration
### Per-request overrides
Use `with_options()` to override client settings for a single call without mutating the client:
```python
client.with_options(timeout=5.0, max_retries=5).messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
```
### Timeouts
Default request timeout is 10 minutes. Pass a float (seconds) or an `httpx.Timeout` for granular control. On timeout the SDK raises `anthropic.APITimeoutError` (and retries per `max_retries`).
```python
import httpx
client = anthropic.Anthropic(timeout=20.0)
client = anthropic.Anthropic(
timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
)
```
### Retries
The SDK auto-retries connection errors, 408, 409, 429, and ≥500 with exponential backoff (default 2 retries). Set `max_retries` on the client or via `with_options()`; `max_retries=0` disables.
### Async performance (aiohttp backend)
For high-concurrency async workloads, install `anthropic[aiohttp]` and pass `DefaultAioHttpClient` instead of the default httpx backend:
```python
from anthropic import AsyncAnthropic, DefaultAioHttpClient
async with AsyncAnthropic(http_client=DefaultAioHttpClient()) as client:
...
```
### Custom HTTP client (proxy, base URL)
Use `DefaultHttpxClient` / `DefaultAsyncHttpxClient` — not raw `httpx.Client` — so the SDK's default timeouts and connection limits are preserved:
```python
from anthropic import Anthropic, DefaultHttpxClient
client = Anthropic(
base_url="http://my.test.server.example.com:8083", # or ANTHROPIC_BASE_URL env var
http_client=DefaultHttpxClient(proxy="http://my.test.proxy.example.com"),
)
```
### Logging
Set `ANTHROPIC_LOG=debug` (or `info`) to enable SDK logging via the standard `logging` module.
---
## Basic Message Request
```python
@@ -53,6 +116,23 @@ response = client.messages.create(
)
```
### Mid-conversation system messages (beta, model-gated)
For operator instructions that arrive mid-conversation (mode switches, injected state), append `{"role": "system", ...}` to `messages` instead of editing top-level `system` — this preserves the cached prefix and carries operator authority. Must follow a user message; cannot be `messages[0]`. Unsupported models return a 400 (`role 'system' is not supported on this model`). See `shared/prompt-caching.md` for when to use this vs. top-level `system`.
```python
response = client.messages.create(
model=MODEL_ID, # must support mid-conversation system messages
max_tokens=16000,
system=[{"type": "text", "text": STABLE_SYSTEM, "cache_control": {"type": "ephemeral"}}],
messages=history + [
{"role": "user", "content": user_message},
{"role": "system", "content": "Terse mode enabled — keep responses under 40 words."},
],
extra_headers={"anthropic-beta": "mid-conversation-system-2026-04-07"},
)
```
---
## Vision (Images)
@@ -222,6 +302,31 @@ except anthropic.APIConnectionError:
---
## Response Helpers
Every response object exposes `_request_id` (populated from the `request-id` header) — log it when reporting failures to Anthropic. Despite the underscore prefix, this property is public.
```python
message = client.messages.create(...)
print(message._request_id) # req_018EeWyXxfu5pfWkrYcMdjWG
print(message.to_json()) # serialize the Pydantic model
print(message.to_dict()) # plain dict
```
To access raw headers or other response metadata, use `.with_raw_response`:
```python
raw = client.messages.with_raw_response.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
print(raw.headers.get("request-id"))
message = raw.parse() # the Message object messages.create() would have returned
```
---
## Multi-Turn Conversations
The API is stateless — send the full conversation history each time.
@@ -268,8 +373,9 @@ response2 = conversation.send("What's my name?") # Claude remembers "Alice"
**Rules:**
- Messages must alternate between `user` and `assistant`
- Consecutive same-role messages are allowed — the API combines them into a single turn
- First message must be `user`
- `role: "system"` messages are allowed mid-conversation under the `mid-conversation-system-2026-04-07` beta on supporting models — see § Mid-conversation system messages above
---
@@ -320,7 +426,17 @@ The `stop_reason` field in the response indicates why the model stopped generati
| `stop_sequence` | Hit a custom stop sequence |
| `tool_use` | Claude wants to call a tool — execute it and continue |
| `pause_turn` | Model paused and can be resumed (agentic flows) |
| `refusal` | Claude refused for safety reasons — output may not match your schema |
| `refusal` | Claude refused for safety reasons — check `stop_details` |
### Structured Stop Details
When `stop_reason` is `"refusal"`, the response includes a `stop_details` object with structured information about the refusal:
```python
if response.stop_reason == "refusal" and response.stop_details:
print(f"Category: {response.stop_details.category}") # "cyber" | "bio" | None
print(f"Explanation: {response.stop_details.explanation}")
```
---

View File

@@ -100,6 +100,19 @@ print(f"Status: {cancelled.processing_status}") # "canceling"
---
## List Batches (auto-pagination)
Iterating the return value of any `list()` call auto-paginates across all pages — do not index into `.data` if you want the full set:
```python
for batch in client.messages.batches.list(limit=20):
print(batch.id, batch.processing_status)
```
For manual control, use `first_page.has_next_page()` / `first_page.get_next_page()` / `first_page.next_page_info()`; `first_page.data` holds the current page's items and `first_page.last_id` is the cursor.
---
## Batch with Prompt Caching
```python

View File

@@ -16,14 +16,18 @@ The Files API uploads files for use in Messages API requests. Reference files vi
## Upload a File
The `file` argument accepts a `(filename, content, content_type)` tuple, a `pathlib.Path` (or any `PathLike` — read for you, async-safe with `AsyncAnthropic`), or an open binary file object.
```python
import anthropic
from pathlib import Path
client = anthropic.Anthropic()
uploaded = client.beta.files.upload(
file=("report.pdf", open("report.pdf", "rb"), "application/pdf"),
)
# or: client.beta.files.upload(file=Path("report.pdf"))
print(f"File ID: {uploaded.id}")
print(f"Size: {uploaded.size_bytes} bytes")
```
@@ -87,9 +91,10 @@ response = client.beta.messages.create(
### List Files
Iterate the list result directly — the SDK auto-paginates across all pages. Only use `.data` if you want the first page only.
```python
files = client.beta.files.list()
for f in files.data:
for f in client.beta.files.list():
print(f"{f.id}: {f.filename} ({f.size_bytes} bytes)")
```

View File

@@ -24,6 +24,22 @@ async with async_client.messages.stream(
print(text, end="", flush=True)
```
### Low-level: `stream=True`
`messages.stream()` (above) is the recommended helper — it accumulates state and exposes `text_stream` / `get_final_message()`. If you only need the raw event iterator and want lower memory use, pass `stream=True` to `messages.create()` instead:
```python
for event in client.messages.create(
model="claude-opus-4-8",
max_tokens=64000,
messages=[{"role": "user", "content": "Write a story"}],
stream=True,
):
print(event.type)
```
No final-message accumulation is done for you in this form.
---
## Handling Different Content Types
@@ -160,3 +176,4 @@ except anthropic.APIStatusError as e:
3. **Track token usage** — The `message_delta` event contains usage information
4. **Use timeouts** — Set appropriate timeouts for your application
5. **Default to streaming** — Use `.get_final_message()` to get the complete response even when streaming, giving you timeout protection without needing to handle individual events
6. **Large `max_tokens` without streaming raises `ValueError`** — The SDK refuses non-streaming requests it estimates will exceed ~10 minutes (idle connections drop). Pass `stream=True` / use `messages.stream()`, or explicitly override `timeout`, to suppress the guard.

View File

@@ -15,10 +15,12 @@ pip install anthropic
```python
import anthropic
# Default (uses ANTHROPIC_API_KEY env var)
# Default — resolves credentials from the environment:
# ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.
# Prefer this for local dev; don't hardcode a key.
client = anthropic.Anthropic()
# Explicit API key
# Explicit API key (only when you must inject a specific key)
client = anthropic.Anthropic(api_key="your-api-key")
```
@@ -129,7 +131,7 @@ client.beta.sessions.events.send(
import json
# Stream-first: open stream, then send while stream is live
with client.beta.sessions.stream(
with client.beta.sessions.events.stream(
session_id=session.id,
) as stream:
client.beta.sessions.events.send(
@@ -140,7 +142,7 @@ with client.beta.sessions.stream(
... # process events
# Standalone stream iteration:
with client.beta.sessions.stream(
with client.beta.sessions.events.stream(
session_id=session.id,
) as stream:
for event in stream:
@@ -150,7 +152,7 @@ with client.beta.sessions.stream(
print(block.text, end="", flush=True)
elif event.type == "agent.custom_tool_use":
# Custom tool invocation — session is now idle
print(f"\nCustom tool call: {event.tool_name}")
print(f"\nCustom tool call: {event.name}")
print(f"Input: {json.dumps(event.input)}")
# Send result back (see below)
elif event.type == "session.status_idle":
@@ -210,7 +212,7 @@ def run_custom_tool(tool_name: str, tool_input: dict) -> str:
def run_session(client, session_id: str):
"""Stream events and handle custom tool calls."""
while True:
with client.beta.sessions.stream(
with client.beta.sessions.events.stream(
session_id=session_id,
) as stream:
tool_calls = []
@@ -232,7 +234,7 @@ def run_session(client, session_id: str):
# Process custom tool calls
results = []
for call in tool_calls:
result = run_custom_tool(call.tool_name, call.input)
result = run_custom_tool(call.name, call.input)
results.append({
"type": "user.custom_tool_result",
"custom_tool_use_id": call.id,

View File

@@ -111,3 +111,30 @@ message = client.messages.create(
For 1-hour TTL: `cache_control: { type: "ephemeral", ttl: "1h" }`. There's also a top-level `cache_control:` on `messages.create` that auto-places on the last cacheable block.
Verify hits via `message.usage.cache_creation_input_tokens` / `message.usage.cache_read_input_tokens`.
---
## Stop Details
When `stop_reason` is `:refusal`, the response includes structured `stop_details`:
```ruby
if message.stop_reason == :refusal && message.stop_details
puts "Category: #{message.stop_details.category}" # :cyber, :bio, or nil
puts "Explanation: #{message.stop_details.explanation}"
end
```
---
## Error Type
`APIStatusError` exposes a `.type` field for programmatic error classification:
```ruby
begin
client.messages.create(...)
rescue Anthropic::APIStatusError => e
puts e.type # :rate_limit_error, :overloaded_error, etc.
end
```

View File

@@ -90,7 +90,7 @@ Both patterns keep the fixed context small and load detail on demand.
| Constraint (from `prompt-caching.md`) | Agent-specific workaround |
| --- | --- |
| Editing the system prompt mid-session invalidates the cache. | Append a `<system-reminder>` block in the `messages` array instead. The cached prefix stays intact. Claude Code uses this for time updates and mode transitions. |
| Editing the system prompt mid-session invalidates the cache. | Append a `{"role": "system", ...}` message to `messages[]` instead (beta, on supporting models — see `prompt-caching.md` § Mid-conversation system messages). The cached prefix stays intact, and the model treats it as an operator-authority instruction rather than user text. On models that don't support it, fall back to a `<system-reminder>` text block in the user turn. |
| Switching models mid-session invalidates the cache. | Spawn a **subagent** with the cheaper model for the sub-task; keep the main loop on one model. Claude Code's Explore subagents use Haiku this way. |
| Adding/removing tools mid-session invalidates the cache. | Use **tool search** for dynamic discovery — it appends tool schemas rather than swapping them, so the existing prefix is preserved. |

View File

@@ -0,0 +1,246 @@
# Anthropic CLI (`ant`)
The `ant` CLI exposes every Claude API resource as a shell subcommand. Compared to `curl`: request bodies are built from typed flags or piped YAML instead of hand-written JSON, `@path` inlines file contents into any string field, `--transform` extracts fields with a GJSON path (no `jq`), list endpoints auto-paginate (cap total results with `--max-items N`; `--limit` only sets the server page size), and the `beta:` prefix auto-sets the right `anthropic-beta` header.
## When to use the CLI vs the SDK
**CLI for the control plane, SDK for the data plane.** Agents and environments are relatively static resources you define, configure, and debug with `ant` — check the YAML into your repo, apply from CI, inspect from a terminal. Sessions are dynamic and driven by your application through the SDK — create per task, stream events, react to tool calls, integrate into your product. Both hit the same API; the split is about where the call lives, not what's possible.
| | Control plane → `ant` | Data plane → SDK |
|---|---|---|
| Resources | agents, environments, skills, vaults, files | sessions, events |
| Cadence | Once per deploy / ad-hoc | Every task / every turn |
| Lives in | `*.yaml` in your repo + CI + terminal | Application code |
| Typical calls | `create < agent.yaml`, `update --version N`, `list`, `retrieve`, `archive`, `--debug` | `sessions.create()`, `events.stream()`, `events.send()` |
## Install and auth
```sh
# macOS
brew install anthropics/tap/ant
xattr -d com.apple.quarantine "$(brew --prefix)/bin/ant"
# Linux / WSL — pick the release from github.com/anthropics/anthropic-cli/releases
curl -fsSL "https://github.com/anthropics/anthropic-cli/releases/download/v${VERSION}/ant_${VERSION}_$(uname -s | tr A-Z a-z)_$(uname -m | sed -e s/x86_64/amd64/ -e s/aarch64/arm64/).tar.gz" \
| sudo tar -xz -C /usr/local/bin ant
# Or from source (Go 1.22+)
go install github.com/anthropics/anthropic-cli/cmd/ant@latest
```
**Auth** — the CLI resolves credentials the same way the SDKs do (first match wins): explicit flags, then `ANTHROPIC_API_KEY`, then `ANTHROPIC_AUTH_TOKEN`, then the `ANTHROPIC_PROFILE`-selected or active profile, then Workload Identity Federation env vars, then the default profile on disk. Override the host with `ANTHROPIC_BASE_URL` or `--base-url`.
- **API key**: set `ANTHROPIC_API_KEY` in the environment.
- **OAuth profile** (no static key to manage): `ant auth login` opens a browser, exchanges for a short-lived token, and stores a profile under `$ANTHROPIC_CONFIG_DIR` (default `~/.config/anthropic/` on Linux/macOS, `%APPDATA%\Anthropic` on Windows — `configs/<profile>.json` for settings, `credentials/<profile>.json` for tokens). Subsequent `ant` (and SDK) calls pick it up automatically — a bare `Anthropic()` client works after login, but scripts that read `ANTHROPIC_API_KEY` directly do not. Claude Code and the Claude Agent SDK honor the same profile resolution. `ant auth status` shows which credential source and profile won (it reports status only — don't script against its exit code as a health check); `ant auth logout` clears the active profile (`--all` for every profile). On a remote host without a browser, `ant auth login --no-browser` prints the authorize URL and accepts the code back in the terminal.
- **Non-interactive workloads** (CI, servers, containers): interactive login is for development on your own machine — use Workload Identity Federation instead (see the authentication docs via `shared/live-sources.md`).
> **The #1 auth trap:** profiles are only consulted when no API key is set. A stale exported `ANTHROPIC_API_KEY` silently overrides every profile — requests hit whatever org/workspace that key is scoped to. `ant auth status` shows which source won; unset the key (or per-command: `env -u ANTHROPIC_API_KEY ant …`) before relying on a profile. Truly **unset** it — an empty `ANTHROPIC_API_KEY=""` still wins its precedence slot and authenticates with an empty key. The same shadowing applies in reverse to Claude Code: after `ant auth login`, Claude Code may warn about an auth conflict between the profile and its own `/login` credential — keep one (use the profile and `/logout` in Claude Code, or `ant auth logout` to keep Claude Code's own login).
**Named profiles** — an interactive-login token is bound to a single org+workspace, and the API only shows resources belonging to that workspace. If an agent, session, or file you created "disappears", the usual cause is a token scoped to a different workspace than the one that created it (`ant auth status` shows the active workspace). Multi-workspace work means one profile per workspace:
```sh
ant auth login --profile <name> # creates the profile if it doesn't exist; org/workspace picker in browser
ant auth login --profile <name> --workspace-id wrkspc_01... # bind directly, skip the picker
ant profile activate <name> # switch the default profile
ant --profile <name> models list # one-off; equivalent: ANTHROPIC_PROFILE=<name> ant models list
ant profile list # inspect
ant profile set workspace_id wrkspc_01... --profile <name> # edit config keys (workspace_id, base_url, organization_id, …)
```
`ant profile set` edits an existing profile's config — it never creates one, and it does **not** rebind already-issued credentials; run `ant auth login` again under that profile to mint a token for the new target. Pointing `ANTHROPIC_PROFILE` at a profile that doesn't exist is an error, not a fall-through. Refresh tokens eventually hard-expire (they don't slide with use) — when a previously working profile starts failing auth, re-run `ant auth login` before debugging anything else.
**Scopes** — a profile's OAuth scope set is requested at login (`--scope`) and persists on the profile (`scope` is also a `profile set` config key; like other config edits, changing it requires a fresh `ant auth login` to take effect). Privileged scopes — e.g. `org:admin` for organization-administration endpoints — are **not** in the default scope set: pass the full set you want explicitly (`ant auth login --profile admin --scope "... org:admin"`), and the server grants a privileged scope only if your role actually has it. Because the scope set rides on every token the profile mints, keep privileged work on a dedicated profile (`admin` vs `default`) and do day-to-day inference on the unprivileged one, switching with `--profile`/`ANTHROPIC_PROFILE`. Check `ant auth login --help` for the current scope list, and `ant auth status` to see what the active token carries.
To hand the active credential to a subprocess or raw-HTTP script:
```sh
# Bare access token — for curl's Authorization header
curl https://api.anthropic.com/v1/messages \
-H "Authorization: Bearer $(ant auth print-credentials --access-token)" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: oauth-2025-04-20" \
-H "content-type: application/json" \
-d '{"model": "claude-opus-4-8", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}]}'
# .env format — sets ANTHROPIC_AUTH_TOKEN (and ANTHROPIC_BASE_URL if the profile has one).
# Output is bare KEY=value (no `export`), so use `set -a` to auto-export for child processes:
set -a; eval "$(ant auth print-credentials --env)"; set +a
python my_script.py # SDK picks up ANTHROPIC_AUTH_TOKEN
```
OAuth tokens go on `Authorization: Bearer` (not `x-api-key:`) **plus the `anthropic-beta: oauth-2025-04-20` header** — converting a raw curl/httpx script from an API key is a header change, not a key swap. The beta header requirement is endpoint-dependent (some endpoints happen to work without it; `/v1/messages` does not) — always send it so requests don't break when you switch endpoints. The token is short-lived and not auto-refreshed when passed via env var, so re-run `print-credentials` before it expires for long-running scripts (`print-credentials` itself refreshes the token if needed). If both `ANTHROPIC_API_KEY` and `ANTHROPIC_AUTH_TOKEN` are set, the SDKs send both and the API rejects the request — unset `ANTHROPIC_API_KEY` before `eval`ing the `--env` output.
**Foot-gun:** `ant auth print-credentials` with **no flags** prints the entire credentials JSON, not the bare token — putting that in an `Authorization` header yields an empty response or HTTP/2 protocol error. Always use `--access-token` for headers (it always reads the named/active profile; a set `ANTHROPIC_API_KEY` doesn't override credential printing).
## Command structure
```
ant <resource>[:<subresource>] <action> [flags]
```
Beta resources (agents, sessions, environments, deployments, skills, vaults, memory stores) live under `beta:` — the CLI auto-sends the right `anthropic-beta` header, so don't pass it yourself unless overriding with `--beta <header>`. For self-hosted environments, `ant beta:worker poll/run` and `ant beta:environments:work stats/stop` drive and monitor the work queue — see `shared/managed-agents-self-hosted-sandboxes.md`.
```sh
ant models list
ant messages create --model claude-opus-4-8 --max-tokens 1024 --message '{role: user, content: "Hello"}'
ant beta:agents retrieve --agent-id agent_01...
ant beta:sessions:events list --session-id session_01...
```
`ant --help` lists resources; append `--help` to any subcommand for its flags.
## Global flags
| Flag | Purpose |
| --- | --- |
| `--format` | `auto` (default: pretty if TTY, compact if piped), `json`, `jsonl`, `yaml`, `pretty`, `raw`, `explore` (interactive TUI) |
| `--transform` | GJSON path applied to the response (per-item on list endpoints). Not applied when `--format raw`. |
| `-r`, `--raw-output` | If the transformed result is a string, print it without quotes (jq semantics). Pair with `--transform` for scalar capture. |
| `--max-items` | Cap total results returned from auto-paginating list endpoints (distinct from `--limit`, which is the server page size). |
| `--format-error` / `--transform-error` | Same as `--format`/`--transform`, applied to error responses. `-r` does not apply to the error path — use `--format-error yaml` for unquoted error scalars. |
| `--base-url` | Override API host |
| `--debug` | Print full HTTP request + response to stderr (API key redacted) |
## Output — `--transform` + `--format`
`--transform` takes a [GJSON path](https://github.com/tidwall/gjson/blob/master/SYNTAX.md). On list endpoints it runs **per item**, not on the envelope.
```sh
ant beta:agents list --transform '{id,name,model}' --format jsonl
```
**Extract a scalar for shell use:** pair `--transform` with `-r` (`--raw-output` — prints strings unquoted, jq-style):
```sh
AGENT_ID=$(ant beta:agents create --name "My Agent" --model '{id: claude-sonnet-4-6}' \
--transform id -r)
```
## Input — flags, stdin, `@file`
**Flags** — scalar fields map directly. Structured fields accept relaxed-YAML syntax (unquoted keys) or strict JSON. Repeatable flags build arrays (each `--tool`, `--event`, `--message` appends one element):
```sh
ant beta:agents create \
--name "Research Agent" \
--model '{id: claude-opus-4-8}' \
--tool '{type: agent_toolset_20260401}' \
--tool '{type: custom, name: search_docs, input_schema: {type: object, properties: {query: {type: string}}}}'
```
**Stdin** — pipe a full JSON or YAML body. Merged with flags; flags win on conflict (for array fields, any flag **replaces** the stdin array entirely — it does not append). Quote the heredoc delimiter (`<<'YAML'`) to disable shell expansion inside the body:
```sh
ant beta:agents create <<'YAML'
name: Research Agent
model: claude-opus-4-8
system: |
You are a research assistant. Cite sources for every claim.
tools:
- type: agent_toolset_20260401
YAML
```
**`@file` references** — inline a file's contents into any string-valued field. Inside structured flag values, quote the path. Binary files are auto-base64'd; force with `@file://` (text) or `@data://` (base64). Escape a literal leading `@` as `\@`.
```sh
ant beta:agents create --name "Researcher" --model '{id: claude-sonnet-4-6}' --system @./prompts/researcher.txt
ant messages create --model claude-opus-4-8 --max-tokens 1024 \
--message '{role: user, content: [
{type: document, source: {type: base64, media_type: application/pdf, data: "@./scan.pdf"}},
{type: text, text: "Extract the text from this scanned document."}
]}' \
--transform 'content.0.text' -r
```
Flags that natively take a file path (e.g. `--file` on `beta:files upload`) accept a bare path without `@`.
## Version-controlled Managed Agents resources
This is the recommended flow for defining agents and environments — check the YAML into your repo and sync via `create` (first time) / `update` (thereafter). See `shared/managed-agents-core.md` for the field reference.
```yaml
# summarizer.agent.yaml
name: Summarizer
model: claude-sonnet-4-6
system: |
You are a helpful assistant that writes concise summaries.
tools:
- type: agent_toolset_20260401
```
```sh
# Create (once) — capture the ID
AGENT_ID=$(ant beta:agents create < summarizer.agent.yaml --transform id -r)
# Update (CI) — needs ID + current version (optimistic lock)
ant beta:agents update --agent-id "$AGENT_ID" --version 1 < summarizer.agent.yaml
```
Same pattern for environments (`ant beta:environments create|update < env.yaml`), then start a session with both IDs:
```sh
ant beta:sessions create --agent "$AGENT_ID" --environment-id "$ENV_ID" --title "Task"
ant beta:sessions:events send --session-id "$SID" \
--event '{type: user.message, content: [{type: text, text: "Summarize X"}]}'
ant beta:sessions:events list --session-id "$SID" --transform 'content.0.text' -r
ant beta:sessions:events stream --session-id "$SID" # live event stream
```
### Interactive session loop (stream-before-send)
`ant beta:sessions:events stream` only delivers events emitted *after* the stream opens — so open it **before** sending the kickoff to avoid missing early events. Use process substitution to hold the stream on a file descriptor, send, then read:
```sh
exec {stream}< <(ant beta:sessions:events stream --session-id "$SID" \
--transform '{type,text:content.#(type=="text").text,err:error.message}' --format yaml)
ant beta:sessions:events send --session-id "$SID" > /dev/null <<'YAML'
events:
- type: user.message
content:
- type: text
text: Summarize the repo README
YAML
type=
while IFS= read -r -u "$stream" line; do
case "$line" in
type:\ session.status_idle) break ;;
type:\ session.error)
IFS= read -r -u "$stream" next || next=
case "$next" in err:\ *) msg=${next#err: } ;; *) msg=unknown ;; esac
printf '\n[Error: %s]\n' "$msg"; break ;;
type:\ *) type=${line#type: } ;;
text:*)
[[ $type == agent.message ]] || continue
val=${line#text: }
case "$val" in '|-'|'|') ;; *) printf '%s' "$val" ;; esac ;;
\ \ *)
if [[ $type == agent.message ]]; then printf '%s\n' "${line# }"; fi ;;
esac
done
exec {stream}<&-
```
This works for interactive exploration and demos. For application code that needs to react to `agent.tool_use` / `agent.custom_tool_use` events, reconnect after drops, or dedup against `events.list`, use the SDK — see `shared/managed-agents-client-patterns.md`.
## Scripting patterns
`--transform id -r` on a list endpoint emits one bare ID per line — compose with `xargs`, or use `--max-items N` to bound the result set without piping through `head`:
```sh
FIRST=$(ant beta:agents list --transform id -r --max-items 1)
ant beta:agents:versions list --agent-id "$FIRST" --transform '{version,created_at}' --format jsonl
```
Error shaping mirrors the success path (note: `-r` does not apply to error output — use `--format-error yaml` for an unquoted scalar here):
```sh
ant beta:agents retrieve --agent-id bogus --transform-error error.message --format-error yaml 2>&1
```
Shell completion: `ant @completion {zsh|bash|fish|powershell}`.
For the full, always-current reference (including per-endpoint flags), WebFetch the **Anthropic CLI** URL in `shared/live-sources.md`.

View File

@@ -0,0 +1,59 @@
# Claude Platform on AWS
**Anthropic-operated** access to the Claude Developer Platform through AWS infrastructure — SigV4 authentication, AWS IAM access control, and AWS Marketplace billing. Because Anthropic operates it, **the API surface matches first-party with same-day parity**: Managed Agents, server-side tools, batches, Files, and every feature in this skill work the same way (**except self-hosted sandboxes** — `config:{type:"self_hosted"}` is not available here; use `cloud`). Model IDs are the bare first-party strings (`claude-opus-4-8`, `claude-sonnet-4-6`) — **no provider prefix**.
> **Not the same as Amazon Bedrock.** Bedrock is partner-operated (AWS runs the service; release schedules vary, feature subset, `anthropic.`-prefixed model IDs). Claude Platform on AWS and Bedrock coexist; pick by whether you need AWS-native IAM/billing with full Anthropic API parity (this page) vs. Bedrock's own ecosystem.
---
## Client & install
| Language | Install | Client |
|---|---|---|
| Python | `pip install -U "anthropic[aws]"` | `from anthropic import AnthropicAWS``AnthropicAWS()` |
| TypeScript | `npm install @anthropic-ai/aws-sdk` | `import AnthropicAws from "@anthropic-ai/aws-sdk"``new AnthropicAws()` |
| Go | `go get github.com/anthropics/anthropic-sdk-go` | `import anthropicaws "github.com/anthropics/anthropic-sdk-go/aws"``anthropicaws.NewClient(ctx, anthropicaws.ClientConfig{})` |
| C# | `dotnet add package Anthropic.Aws` | `new AnthropicAwsClient()` |
| Java | See SDK repo in `shared/live-sources.md` | See SDK repo in `shared/live-sources.md` |
| Ruby | `gem install anthropic aws-sdk-core` | See SDK repo in `shared/live-sources.md` |
| PHP | `composer require anthropic-ai/sdk aws/aws-sdk-php` | See SDK repo in `shared/live-sources.md` |
After construction, **use the client exactly as you would `Anthropic()`**`client.messages.create(...)`, `client.beta.sessions.*`, etc., with bare model IDs.
```python
from anthropic import AnthropicAWS
client = AnthropicAWS() # region + workspace_id from env; see below
client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
```
---
## Required configuration
Two values must be available (constructor args or environment) — **there is no default fallback** for either:
| Value | Env var | Notes |
|---|---|---|
| AWS region | `AWS_REGION` | Required. Unlike `AnthropicBedrock`, there is no `us-east-1` fallback. |
| Workspace ID | `ANTHROPIC_AWS_WORKSPACE_ID` | Required. Routes requests to your Claude workspace. |
Endpoint pattern: `https://aws-external-anthropic.{region}.api.aws/v1/...`. Requests are SigV4-signed with service name `aws-external-anthropic`.
## Authentication
The client resolves AWS credentials via the standard precedence chain: explicit constructor args → environment (`AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`/`AWS_SESSION_TOKEN`) → shared profile → assumed role / instance metadata.
**Short-term API keys** are also supported for cases where SigV4 isn't practical (e.g., browser, simple scripts). Mint one with the per-language token-generator package; pass it as `api_key` on the client. Lifetime is the **lesser of** the requested duration, the underlying credential's expiry, and **12 hours**. For package names and IAM details, WebFetch the Claude Platform on AWS page in `shared/live-sources.md`.
---
## What to tell users
- Treat it as first-party: every section of this skill applies unchanged. Do **not** apply Bedrock's feature-availability mask.
- Model IDs are bare (`claude-opus-4-8`). Do **not** add an `anthropic.` prefix.
- A missing region or `workspace_id` throws at client-construction time (no request is sent). A **403** means the request reached the server — check for a **wrong** `workspace_id` or a missing IAM action on the principal. See the IAM actions reference in `shared/live-sources.md`.

View File

@@ -55,8 +55,10 @@ This file documents HTTP error codes returned by the Claude API, their common ca
- Missing `x-api-key` header or `Authorization` header
- Invalid API key format
- Revoked or deleted API key
- OAuth bearer token sent via `x-api-key` instead of `Authorization: Bearer`
- Both `ANTHROPIC_API_KEY` and `ANTHROPIC_AUTH_TOKEN` set — the SDK sends both headers and the API rejects the request
**Fix:** Ensure `ANTHROPIC_API_KEY` environment variable is set correctly.
**Fix:** Set `ANTHROPIC_API_KEY`, or run `ant auth login` and leave the client constructor empty. For raw HTTP with an OAuth token, use `Authorization: Bearer <token>` (not `x-api-key:`).
---
@@ -185,8 +187,10 @@ thinking: budget_tokens=10000, max_tokens=16000
| 401 | `Anthropic.AuthenticationError` | `anthropic.AuthenticationError` |
| 403 | `Anthropic.PermissionDeniedError` | `anthropic.PermissionDeniedError` |
| 404 | `Anthropic.NotFoundError` | `anthropic.NotFoundError` |
| 413 | `Anthropic.RequestTooLargeError` | `anthropic.RequestTooLargeError` |
| 429 | `Anthropic.RateLimitError` | `anthropic.RateLimitError` |
| 500+ | `Anthropic.InternalServerError` | `anthropic.InternalServerError` |
| 529 | `Anthropic.OverloadedError` | `anthropic.OverloadedError` |
| Any | `Anthropic.APIError` | `anthropic.APIError` |
```typescript
@@ -211,3 +215,15 @@ try {
```
All exception classes extend `Anthropic.APIError`, which has a `status` property. Use `instanceof` checks from most specific to least specific (e.g., check `RateLimitError` before `APIError`).
### Error `.type` Field
All `APIStatusError` subclasses now expose a `.type` property (Python: `.type`, TypeScript: `.type`, Java: `.errorType()`, Go: `.Type()`, Ruby: `.type`, PHP: `.type`) that returns the API error type string (e.g., `"invalid_request_error"`, `"authentication_error"`, `"rate_limit_error"`, `"overloaded_error"`). Use this for programmatic error classification when you need finer granularity than the HTTP status code — for example, distinguishing `"billing_error"` from `"permission_error"` (both map to 403).
```python
except anthropic.APIStatusError as e:
if e.type == "rate_limit_error":
# handle rate limiting
elif e.type == "overloaded_error":
# handle overload
```

View File

@@ -46,6 +46,9 @@ This file contains WebFetch URLs for fetching current information from platform.
| Token Counting | `https://platform.claude.com/docs/en/build-with-claude/token-counting.md` | "Extract token counting API usage and examples" |
| Rate Limits | `https://platform.claude.com/docs/en/api/rate-limits.md` | "Extract current rate limits by tier and model" |
| Errors | `https://platform.claude.com/docs/en/api/errors.md` | "Extract HTTP error codes, meanings, and retry guidance" |
| Amazon Bedrock | `https://platform.claude.com/docs/en/build-with-claude/claude-on-amazon-bedrock.md` | "Extract the AnthropicBedrockMantle client per language, `anthropic.`-prefixed model IDs, auth paths, feature availability, and regions" |
| Claude Platform on AWS | `https://platform.claude.com/docs/en/build-with-claude/claude-platform-on-aws.md` | "Extract the AnthropicAWS client per language, SigV4 auth, credential precedence, short-term API keys, workspace_id, and region requirements" |
| Claude Platform on AWS — IAM actions | `https://platform.claude.com/docs/en/api/claude-platform-on-aws-iam-actions.md` | "Extract the IAM action names, resource ARNs, and policy examples required for each API capability" |
### Tools
@@ -107,6 +110,8 @@ The `ant` CLI provides terminal access to the Claude API. Every API resource is
| Topic | URL | Extraction Prompt |
| ------------- | ------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| Anthropic CLI | `https://platform.claude.com/docs/en/api/sdks/cli.md` | "Extract CLI install, authentication, command structure, and the beta:agents/environments/sessions commands" |
| Authentication overview | `https://platform.claude.com/docs/en/manage-claude/authentication.md` | "Extract the credential options (API keys, interactive OAuth login, Workload Identity Federation) and when to use each" |
| WIF reference | `https://platform.claude.com/docs/en/manage-claude/wif-reference.md` | "Extract credential precedence order, the profile configuration file schema, and the configuration directory layout" |
---

View File

@@ -369,7 +369,7 @@ Managed Agents endpoints have per-organization request-per-minute (RPM) limits,
| Endpoint group | Scope | RPM | Max concurrent |
|---|---|---|---|
| Create operations (Agents, Sessions, Vaults) | organization | 60 | — |
| Create operations (Agents, Sessions, Vaults) | organization | 300 | — |
| All other operations (Agents, Sessions, Vaults) | organization | 600 | — |
| All operations (Environments) | organization | 60 | 5 |

View File

@@ -62,6 +62,16 @@ rescheduling → running ↔ idle → terminated
| Archive | Session becomes **read-only**. Not reversible. |
| Delete | Permanently deletes session, event history, container, and checkpoints. |
These are ops/inspection calls — typically made from a terminal, not application code. From the shell (see `shared/anthropic-cli.md`):
```sh
ant beta:sessions list --transform '{id,title,status,created_at}' --format jsonl
ant beta:sessions retrieve --session-id "$SID"
ant beta:sessions:events stream --session-id "$SID" # watch events live
ant beta:sessions archive --session-id "$SID"
ant beta:sessions delete --session-id "$SID"
```
---
## Sessions
@@ -112,6 +122,8 @@ const session = await client.beta.sessions.create(
);
```
> 💡 **Watch it live in Console.** While developing, print a link so you can click through to the session UI instead of parsing the event stream: `https://platform.claude.com/workspaces/{workspace}/sessions/{session.id}`. Use `default` for the workspace slug unless you created the session in a non-default workspace; the sessions list for your workspace is at `https://platform.claude.com/workspaces/default/sessions`. Include this `print`/`console.log` in code you generate for users who are iterating locally.
**Session creation parameters:**
| Field | Type | Required | Description |
@@ -173,6 +185,8 @@ The agent is a **persistent resource**, not a per-run parameter. The intended pa
**Anti-pattern:** calling `agents.create()` at the top of every script run. This accumulates orphaned agent objects, pays create latency on every invocation, and defeats the versioning model. If you see `agents.create()` in a function that's called per-request or per-cron-tick, that's wrong — hoist it to one-time setup and persist the ID.
> **Recommended — define agents and environments as YAML + apply via the `ant` CLI.** The split is **CLI for the control plane, SDK for the data plane**: agents and environments are relatively static resources you manage with `ant` (version-controlled YAML, applied from CI); sessions are dynamic and driven by your application through the SDK. See `shared/anthropic-cli.md` → *Version-controlled Managed Agents resources* for the `ant beta:agents create < agent.yaml` / `update --version N` flow. The SDK `agents.create()` call shown elsewhere in this doc is the in-code equivalent — use it when you need to provision programmatically, but prefer the YAML flow for anything a human maintains.
### Versioning
Each `POST /v1/agents/{id}` (update) creates a new immutable version (numeric timestamp, e.g. `1772585501101368014`). The agent's history is append-only — you can't edit a past version.

View File

@@ -9,20 +9,24 @@ Creating a session requires an `environment_id`. Environments are **reusable con
### Networking
| Network Policy | Description |
| ------------------------------- | ------------------------------------------------------------- |
| ---------------- | ------------------------------------------------------------- |
| `unrestricted` | Full egress (except legal blocklist) |
| `package_managers_and_custom` | Package managers + custom `allowed_hosts` |
| `limited` | Deny-by-default; opt in via `allowed_hosts` / `allow_package_managers` / `allow_mcp_servers` |
```json
{
"networking": {
"type": "package_managers_and_custom",
"type": "limited",
"allow_package_managers": true,
"allow_mcp_servers": true,
"allowed_hosts": ["api.example.com"]
}
}
```
**MCP caveat:** If using restricted networking, make sure `allowed_hosts` includes your MCP server domains. Otherwise the container can't reach them and tools silently fail.
All three `limited` fields are optional. `allow_package_managers` (default `false`) permits PyPI/npm/etc.; `allow_mcp_servers` (default `false`) permits the agent's configured MCP server endpoints without listing them in `allowed_hosts`.
**MCP caveat:** Under `limited` networking, either set `allow_mcp_servers: true` or add each MCP server domain to `allowed_hosts`. Otherwise the container can't reach them and tools silently fail.
### Creating an environment

View File

@@ -2,7 +2,7 @@
> **Invoked via `/claude-api managed-agents-onboard`?** You're in the right place. Run the interview below — don't summarize it back to the user, ask the questions.
Use this when a user wants to set up a Managed Agent from scratch. Three steps: **branch on know-vs-explore → configure the template → set up the session**. End by emitting working code.
Use this when a user wants to set up a Managed Agent from scratch: **branch on know-vs-explore → configure the template → set up the session → pre-flight viability check → emit working code.** The pre-flight check (§3) is not optional — a setup missing a tool, credential, or data access it needs will fail mid-run, and the gap is usually visible at setup time.
> Read `shared/managed-agents-core.md` alongside this — it has full detail for each knob. This doc is the interview script, not the reference.
@@ -30,8 +30,8 @@ Four shapes, same runtime code path (`sessions.create()` → `sessions.events.se
| Pattern | Trigger | Example |
|---|---|---|
| Event-triggered | Webhook | GitHub PR push → CMA (GitHub tool) → Slack | # <------ MC maybe delete?
| Scheduled | Cron | Daily brief: browser + GitHub + Jira → CMA → Slack | # <------ MC maybe delete?
| Event-triggered | Webhook | GitHub PR push → CMA (GitHub tool) → Slack |
| Scheduled | Cron | Daily brief: browser + GitHub + Jira → CMA → Slack |
| Fire-and-forget PR | Human | Slack slash-command → CMA (GitHub tool) → PR passing CI |
| Research + dashboard | Human | Topic → CMA (web search + `frontend-design` skill) → HTML dashboard |
@@ -70,10 +70,11 @@ Emit as `resources: [{type: "github_repository", url, authorization_token, ...}]
Emit as `resources: [{type: "file", file_id, mount_path}]`. Max 999 file resources. Agent working directory defaults to `/workspace`. Full detail: `shared/managed-agents-environments.md` → Files API.
**Round C — Environment + identity:**
- [ ] Networking: unrestricted internet from the container, or lock egress to specific hosts? (If locked, MCP server domains must be in `allowed_hosts` or tools silently fail.)
**Round C — Identity, success criteria, environment:**
- [ ] Name?
- [ ] Job (one or two sentences — becomes the system prompt)?
- [ ] **What does "done" look like?** Push for concrete, checkable success criteria — not "a good report" but "a CSV with a numeric `price` column per SKU." Explicit criteria give the agent a clear target and let you verify the result; vague ones leave it guessing what "done" means. If they're gradeable, plan to wire an **Outcome** in §2 so the harness grades-and-revises against them. See `shared/managed-agents-outcomes.md`.
- [ ] Networking: unrestricted internet from the container, or lock egress to specific hosts? (If locked, MCP server domains must be in `allowed_hosts` or tools silently fail.)
- [ ] Model? (default `claude-opus-4-8`)
---
@@ -87,27 +88,56 @@ Per-run. Points at the agent + environment, attaches credentials, kicks off.
Credentials are write-only, matched to MCP servers by URL, auto-refreshed. See `shared/managed-agents-tools.md` → Vaults.
**Kickoff:**
- [ ] First message to the agent?
**Kickoff — pick one:**
- [ ] **Conversational:** a first `user.message` to the agent.
- [ ] **Outcome-graded** (recommended when §Round C produced checkable criteria): send a `user.define_outcome` with a rubric *instead of* a `user.message` — the harness iterates and grades against the rubric until satisfied. Don't send both. See `shared/managed-agents-outcomes.md`.
Session creation blocks until all resources mount. Open the event stream before sending the kickoff. Stream is SSE; break on `session.status_terminated`, or on `session.status_idle` with a terminal `stop_reason` — i.e. anything except `requires_action`, which fires transiently while the session waits on a tool confirmation or custom-tool result (see `shared/managed-agents-client-patterns.md` Pattern 5). Usage lands on `span.model_request_end`. Agent-written artifacts end up in `/mnt/session/outputs/` — download via `files.list({scope_id: session.id, betas: ["managed-agents-2026-04-01"]})`.
**Console escape hatch.** In the runtime block you emit, print the session's Console URL right after `sessions.create()` so the user can watch it in the UI while iterating: `print(f"Watch in Console: https://platform.claude.com/workspaces/default/sessions/{session.id}")` (swap `default` for the user's workspace slug if they named one).
---
## 3. Emit the code
## 3. Pre-flight viability check — reconcile the job against the resources
Go straight from the last interview answer to the code — no preamble about the setup-vs-runtime split, no "the critical thing to internalize…", no lecture about `agents.create()` being one-time. The two-block structure below already shows that; don't narrate it. Generate **two clearly-separated blocks** per language detected (Python/TS/cURL — see SKILL.md → Language Detection):
**Do this before emitting any code.** A common, avoidable failure is an under-resourced run: the ask is clear, but the agent is missing a tool, a credential, data access, or the context to act. The agent discovers the gap a few turns in, flails, and gives up — burning the budget to produce nothing. The gap is usually visible at setup time. Catch it here, not after the session fails.
**Block 1 — Setup (run once, store the IDs):**
1. `environments.create()` → persist `env_id`
2. `agents.create()` with everything from §Round AC → persist `agent_id` and `agent_version`
Walk the stated job clause by clause. For each action the agent must take, confirm a resource covers it — and name the gap out loud if one doesn't:
Label: `# ONE-TIME SETUP — run once, save the IDs to config/.env`
| Gap class | Check | If missing |
|---|---|---|
| **Tool / integration** (most catchable upfront — config is statically inspectable) | Every verb in the job maps to an enabled tool or MCP server. "Triage tickets" → a ticketing MCP server; "open a PR" → GitHub MCP server (a `github_repository` mount alone can't open PRs); "search the web" → `web_search` enabled in the toolset. | Add the tool/MCP server in §Round A, or cut the ask from the job. |
| **Credential / access** | Every MCP server has a vault credential attached (§2). Every external host the job touches is reachable — networking `unrestricted`, or the host is in `allowed_hosts`. | Create/attach the vault; widen `allowed_hosts`. These don't fail until runtime — the smoke-test in §4 is how you surface them cheaply. |
| **Data** | Every file, dataset, or repo the job references is mounted as a `resource` (file, `github_repository`, or memory store). | Upload + mount it in §Round B, or tell the agent where to fetch it from. |
| **Prompt quality / criteria** | The job is specific enough to act on, and "done" is checkable (§Round C). | Tighten the job; wire an Outcome. |
**Block 2 — Runtime (run on every invocation):**
State any unmet gaps to the user and resolve them before generating code. Don't emit a config you already know is under-resourced — an agent can't complete a task it lacks the tools, credentials, or data for.
---
## 4. Emit the code
Go straight from the last interview answer to the code — no preamble about the setup-vs-runtime split, no "the critical thing to internalize…", no lecture about `agents.create()` being one-time. The two-block structure below already shows that; don't narrate it. Generate **two clearly-separated blocks**:
**Block 1 — Setup (run once, store the IDs).** Prefer emitting this as **YAML files + `ant` CLI commands** — agents and environments are version-controlled definitions, and the CLI flow is what users should check into their repo and run from CI. Fall back to SDK code only if the user explicitly wants setup in-language or the `ant` CLI is unavailable.
Emit:
1. `<name>.agent.yaml` with everything from §Round AC (flat: `name`, `model`, `system`, `tools`, `mcp_servers`, `skills`)
2. `<name>.environment.yaml` with §Round C networking
3. The apply commands:
```sh
AGENT_ID=$(ant beta:agents create < <name>.agent.yaml --transform id -r)
ENV_ID=$(ant beta:environments create < <name>.environment.yaml --transform id -r)
# CI sync: ant beta:agents update --agent-id "$AGENT_ID" --version N < <name>.agent.yaml
```
See `shared/anthropic-cli.md` for the full CLI reference. If emitting SDK code instead, label it `# ONE-TIME SETUP — run once, save the IDs to config/.env` and call `environments.create()` → `agents.create()`.
**Block 2 — Runtime (run on every invocation).** This is SDK code in the detected language (Python/TS/cURL — see SKILL.md → Language Detection). The runtime path needs to react programmatically to events (tool confirmations, custom tool results, reconnect), which is SDK territory — don't emit shell loops here.
1. Load `env_id` + `agent_id` from config/env
2. `sessions.create(agent=AGENT_ID, environment_id=ENV_ID, resources=[...], vault_ids=[...])`
3. Open stream, `events.send()` the kickoff, loop until `session.status_terminated` or `session.status_idle && stop_reason.type !== 'requires_action'` (see `shared/managed-agents-client-patterns.md` Pattern 5 for the full gate — do not break on bare `session.status_idle`)
2. `sessions.create(agent=AGENT_ID, environment_id=ENV_ID, resources=[...], vault_ids=[...])` — this blocks until resources mount, so a bad file/repo mount surfaces *here*, before any tokens are spent.
3. **Smoke-test first when the job depends on MCP servers, credentials, or reachable hosts.** Credential and MCP-connectivity failures don't surface at `sessions.create()` — only when the agent first tries to use them. Send one cheap probe turn ("Confirm you can reach <service> and list 12 items; don't start the task yet"), check it succeeded, *then* send the real kickoff. A few hundred tokens here beats a runaway session that flails on a missing credential and gives up. Skip for agents with no external dependencies.
4. Open stream, `events.send()` the kickoff (a `user.message`, or a `user.define_outcome` if §2 chose the outcome-graded path), loop until `session.status_terminated` or `session.status_idle && stop_reason.type !== 'requires_action'` (see `shared/managed-agents-client-patterns.md` Pattern 5 for the full gate — do not break on bare `session.status_idle`)
> ⚠️ **Never emit `agents.create()` and `sessions.create()` in the same unguarded block.** That teaches the user to create a new agent on every run — the #1 anti-pattern. If they need a single script, wrap agent creation in `if not os.getenv("AGENT_ID"):`.

View File

@@ -52,6 +52,7 @@ Managed Agents is in beta. The SDK sets required beta headers automatically:
| Run tool execution in your own infra / VPC (self-hosted sandbox) | `shared/managed-agents-self-hosted-sandboxes.md``config:{type:"self_hosted"}`, `ANTHROPIC_ENVIRONMENT_KEY`, `EnvironmentWorker.run()` / `ant beta:worker poll` |
| Upload files / attach repos | `shared/managed-agents-environments.md` (Resources) |
| Give agents persistent memory across sessions | `shared/managed-agents-memory.md` — memory stores, `memory_store` session resource, preconditions, versions/redact |
| Define agents/environments as version-controlled YAML; drive the API from the shell | `shared/anthropic-cli.md``ant beta:agents create < agent.yaml`, `--transform`, `@file` inlining |
| Store MCP credentials | `shared/managed-agents-tools.md` (Vaults section) |
| Call a non-MCP API / CLI that needs a secret | `shared/managed-agents-client-patterns.md` Pattern 9 — no container env vars; vaults are MCP-only; keep the secret host-side via a custom tool |
@@ -60,6 +61,7 @@ Managed Agents is in beta. The SDK sets required beta headers automatically:
- **Agent FIRST, then session — NO EXCEPTIONS** — the session's `agent` field accepts **only** a string ID or `{type: "agent", id, version}`. `model`, `system`, `tools`, `mcp_servers`, `skills` are **top-level fields on `POST /v1/agents`**, never on `sessions.create()`. If the user hasn't created an agent, that is step zero of every example.
- **Agent ONCE, not every run** — `agents.create()` is a setup step. Store the returned `agent_id` and reuse it; don't call `agents.create()` at the top of your hot path. If the agent's config needs to change, `POST /v1/agents/{id}` — each update creates a new version, and sessions can pin to a specific version for reproducibility.
- **MCP auth goes through vaults** — the agent's `mcp_servers` array declares `{type, name, url}` only (no auth). Credentials live in vaults (`client.beta.vaults.credentials.create`) and attach to sessions via `vault_ids`. Anthropic auto-refreshes OAuth tokens using the stored refresh token.
- **Reconcile resources before the first run** — a session with a clear ask but a missing tool, credential, data mount, or context will discover the gap mid-run, then flail and give up. Before creating the session, check that every action in the task maps to a configured tool/MCP server, every MCP server has a vault credential, and every referenced file/host is mounted/reachable. When helping a user set one up, run the reconciliation in `shared/managed-agents-onboarding.md` → §3 Pre-flight viability check.
- **Stream to get events** — `GET /v1/sessions/{id}/events/stream` is the primary way to receive agent output in real-time.
- **SSE stream has no replay — reconnect with consolidation** — if the stream drops while a `agent.tool_use`, `agent.mcp_tool_use`, or `agent.custom_tool_use` is pending resolution (`user.tool_confirmation` for the first two, `user.custom_tool_result` for the last one), the session deadlocks (client disconnects → session idles → reconnect happens → no client resolution happens). On every (re)connect: open stream with `GET /v1/sessions/{id}/events/stream` , fetch `GET /v1/sessions/{id}/events`, dedupe by event ID, then proceed. See `shared/managed-agents-events.md` → Reconnecting after a dropped stream.
- **Don't trust HTTP-library timeouts as wall-clock caps** — `requests` `timeout=(c, r)` and `httpx.Timeout(n)` are *per-chunk* read timeouts; they reset every byte, so a trickling connection can block indefinitely. For a hard deadline on raw-HTTP polling, track `time.monotonic()` at the loop level and bail explicitly. Prefer the SDK's `sessions.events.stream()` / `session.events.list()` over hand-rolled HTTP. See `shared/managed-agents-events.md` → Receiving Events.

View File

@@ -78,7 +78,7 @@ await new EnvironmentWorker({
## Run a worker — `ant` CLI (fixed tools)
The `ant` CLI ships a worker with the fixed built-in toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`). Install per the Anthropic CLI docs (see `shared/live-sources.md` → Anthropic CLI), then:
The `ant` CLI ships a worker with the fixed built-in toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`). Install per `shared/anthropic-cli.md`, then:
```sh
export ANTHROPIC_ENVIRONMENT_KEY=sk-ant-oat01-...

View File

@@ -1,5 +1,7 @@
# Model Migration Guide
> **If you arrived via `/claude-api migrate`:** this is the right file. Execute the steps below in order — do not summarize them back to the user. Start with Step 0 (confirm scope) before touching any file.
How to move existing code to newer Claude models. Covers breaking changes, deprecated parameters, and drop-in replacements for retired models.
For the latest, authoritative version (with code samples in every supported language), WebFetch the **Migration Guide** URL from `shared/live-sources.md`. Use this file for the consolidated, skill-resident reference; fall back to the live docs whenever a model launch or breaking change may have shifted the picture.
@@ -86,6 +88,7 @@ Code examples in this guide are Python. **The same fields exist in every officia
> **Verify type and method names against the SDK source before writing them into customer code.** WebFetch the relevant repository from the SDK source-code table in `shared/live-sources.md` (one row per SDK) and confirm the exact symbol — particularly for typed SDKs (Go, Java, C#) where union/builder names can differ from the JSON shape. Do not guess type names that aren't in the table below or in `<lang>/claude-api/README.md`.
<!-- The rows below were verified against each SDK's `synced/model-launch-april` branch. -->
### `thinking` — `budget_tokens` → adaptive
@@ -483,6 +486,26 @@ If the model is now overtriggering a tool or skill, the fix is almost always to
Older aliases (`claude-opus-4-7`, `claude-opus-4-6`, `claude-opus-4-5`, `claude-sonnet-4-5`, etc.) are still active and can be pinned if you need time before upgrading — see `shared/models.md` for the full legacy list.
### Amazon Bedrock model IDs
If the code uses the `AnthropicBedrockMantle` client (Python `anthropic[bedrock]`, TypeScript `@anthropic-ai/bedrock-sdk`, Java `BedrockMantleBackend`, Go `bedrock.NewMantleClient`, etc.) or targets `https://bedrock-mantle.{region}.api.aws/anthropic`, it is running on **Claude in Amazon Bedrock**. All breaking changes in this guide apply unchanged there — it serves the same Messages API shape — but model IDs carry an `anthropic.` provider prefix:
| First-party ID | Bedrock ID |
|---|---|
| `claude-opus-4-8` | `anthropic.claude-opus-4-8` |
| `claude-opus-4-7` | `anthropic.claude-opus-4-7` |
| `claude-haiku-4-5` | `anthropic.claude-haiku-4-5` |
When migrating a Bedrock file, apply the same rename-table row as first-party, then keep/add the `anthropic.` prefix. Do **not** generate a first-party `claude-*` ID for a Bedrock client — it will 400.
**Skip for Bedrock:** the `code_execution_*` tool-version checklist item and the **Task Budgets** section — both are first-party-only features (Bedrock does not support server-side Anthropic tools or the `task-budgets-2026-03-13` beta). Everything else in this guide — `effort`, adaptive/extended thinking, `output_config.format`, `thinking.display`, fine-grained tool streaming, token counting — is available on Bedrock.
> **Out of scope:** the legacy Amazon Bedrock integration (`InvokeModel` / `Converse` APIs with ARN-versioned IDs like `anthropic.claude-3-5-sonnet-20241022-v2:0`) uses a different request shape and model-ID format. This guide does not cover it; WebFetch the Bedrock page in `shared/live-sources.md` if the user is migrating between the two Bedrock integrations.
### Claude Platform on AWS
If the code uses `AnthropicAWS` / `AnthropicAws` / `anthropicaws.NewClient` / `AnthropicAwsClient` (or targets `https://aws-external-anthropic.{region}.api.aws`), it is running on **Claude Platform on AWS** — Anthropic-operated, same-day API parity. Model IDs are **bare first-party** strings; apply the rename table above **verbatim** and every breaking-change section in this guide unchanged. There is nothing to skip. Do **not** add an `anthropic.` prefix (that's Amazon Bedrock, a separate offering). See `shared/claude-platform-on-aws.md` for client/auth details.
---
## Migration Checklist
@@ -496,6 +519,7 @@ For each file that calls `messages.create()` / equivalent SDK method:
- [ ] **[BLOCKS]** Move `format` from top-level `output_format` into `output_config.format`
- [ ] **[BLOCKS]** Remove any assistant-turn prefills if targeting Opus 4.6 or Sonnet 4.6 (see the prefill replacement table)
- [ ] **[BLOCKS]** Switch to streaming if `max_tokens > ~16000` (otherwise SDK HTTP timeout)
- [ ] **[TUNE]** Verify tool-input handling parses JSON rather than raw-string-matching the serialized input (4.6 may escape Unicode / forward slashes differently; most SDKs already expose `block.input` as a parsed object)
- [ ] **[TUNE]** Set `output_config={"effort": "..."}` explicitly — especially when moving Sonnet 4.5 → Sonnet 4.6 (4.6 defaults to `high`)
- [ ] **[TUNE]** Remove GA beta headers: `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `token-efficient-tools-2025-02-19`, `output-128k-2025-02-19`; remove `interleaved-thinking-2025-05-14` once on adaptive thinking
- [ ] **[TUNE]** Switch `client.beta.messages.create(...)``client.messages.create(...)` once all betas are removed

View File

@@ -77,13 +77,13 @@ curl https://api.anthropic.com/v1/models/claude-opus-4-8 \
| Claude Opus 4.5 | `claude-opus-4-5` | `claude-opus-4-5-20251101` | Active |
| Claude Opus 4.1 | `claude-opus-4-1` | `claude-opus-4-1-20250805` | Active |
| Claude Sonnet 4.5 | `claude-sonnet-4-5` | `claude-sonnet-4-5-20250929` | Active |
| Claude Sonnet 4 | `claude-sonnet-4-0` | `claude-sonnet-4-20250514` | Active |
| Claude Opus 4 | `claude-opus-4-0` | `claude-opus-4-20250514` | Active |
## Deprecated Models (retiring soon)
| Friendly Name | Alias (use this) | Full ID | Status | Retires |
|-------------------|---------------------|-------------------------------|------------|--------------|
| Claude Sonnet 4 | `claude-sonnet-4-0` | `claude-sonnet-4-20250514` | Deprecated | TBD |
| Claude Opus 4 | `claude-opus-4-0` | `claude-opus-4-20250514` | Deprecated | TBD |
| Claude Haiku 3 | — | `claude-3-haiku-20240307` | Deprecated | Apr 19, 2026 |
## Retired Models (no longer available)
@@ -115,9 +115,9 @@ When a user asks for a model by name, use this table to find the correct model I
| "sonnet", "balanced" | `claude-sonnet-4-6` |
| "sonnet 4.6" | `claude-sonnet-4-6` |
| "sonnet 4.5" | `claude-sonnet-4-5` |
| "sonnet 4", "sonnet 4.0" | `claude-sonnet-4-0` |
| "sonnet 3.7" | Retired — suggest `claude-sonnet-4-5` |
| "sonnet 3.5" | Retired — suggest `claude-sonnet-4-5` |
| "sonnet 4", "sonnet 4.0" | `claude-sonnet-4-0` (deprecated — suggest `claude-sonnet-4-6`) |
| "sonnet 3.7" | Retired — suggest `claude-sonnet-4-6` |
| "sonnet 3.5" | Retired — suggest `claude-sonnet-4-6` |
| "haiku", "fast", "cheap" | `claude-haiku-4-5` |
| "haiku 4.5" | `claude-haiku-4-5` |
| "haiku 3.5" | Retired — suggest `claude-haiku-4-5` |

View File

@@ -62,6 +62,24 @@ Many requests share a large fixed preamble (few-shot examples, retrieved docs, i
]}]
```
### Mid-conversation system messages
**Beta, model-gated.** When an operator instruction arrives mid-conversation — a mode switch, updated context, dynamically injected state — send it as `{"role": "system", "content": "..."}` appended to `messages[]`, rather than editing top-level `system`. Editing top-level `system` changes the prefix ahead of the entire conversation history, so every cached turn is re-processed uncached; a `role: "system"` message sits after the history and leaves the cached prefix intact.
```json
// Top-level system stays byte-identical; new instruction goes after the cached history
"system": [{"type": "text", "text": "<stable core>", "cache_control": {"type": "ephemeral"}}],
"messages": [
...history,
{"role": "user", "content": "..."},
{"role": "system", "content": "Terse mode enabled — keep responses under 40 words."}
]
```
This is also the prompt-injection-safe replacement for embedding operator instructions as text inside a user turn (the `<system-reminder>` pattern): both have the same caching profile, but `role: "system"` is the non-spoofable operator channel, whereas text inside user/tool content can be forged by anything that writes to user-visible input.
Requires `anthropic-beta: mid-conversation-system-2026-04-07`. Must follow a `role: "user"` message (or an assistant message ending in a server tool result); cannot be `messages[0]` — use top-level `system` for the initial prompt. Content is text-only. Model-gated — unsupported models return a 400 (`BadRequestError`: `role 'system' is not supported on this model`); catch that error and fall back to putting the instruction in a user-turn `<system-reminder>` block.
### Prompts that change from the beginning every time
Don't cache. If the first 1K tokens differ per request, there is no reusable prefix. Adding `cache_control` only pays the cache-write premium with zero reads. Leave it off.
@@ -72,7 +90,7 @@ Don't cache. If the first 1K tokens differ per request, there is no reusable pre
These are the decisions that matter more than marker placement. Fix these first.
**Keep the system prompt frozen.** Don't interpolate "current date: X", "mode: Y", "user name: Z" into the system prompt — those sit at the front of the prefix and invalidate everything downstream. Inject dynamic context as a user or assistant message later in `messages`. A message at turn 5 invalidates nothing before turn 5.
**Keep the system prompt frozen.** Don't interpolate "current date: X", "mode: Y", "user name: Z" into the system prompt — those sit at the front of the prefix and invalidate everything downstream. Inject dynamic context later in `messages` instead — as a `{"role": "system", ...}` message where supported (see § Mid-conversation system messages above), or as text in a user message otherwise. A message at turn 5 invalidates nothing before turn 5.
**Don't change tools or model mid-conversation.** Tools render at position 0; adding, removing, or reordering a tool invalidates the entire cache. Same for switching models (caches are model-scoped). If you need "modes", don't swap the tool set — give Claude a tool that records the mode transition, or pass the mode as message content. Serialize tools deterministically (sort by name).
@@ -169,3 +187,37 @@ Fix: place an intermediate breakpoint every ~15 blocks in long turns, or put the
A cache entry becomes readable only after the first response **begins streaming**. N parallel requests with identical prefixes all pay full price — none can read what the others are still writing.
For fan-out patterns: send 1 request, await the first streamed token (not the full response), then fire the remaining N1. They'll read the cache the first one just wrote.
## Pre-warming the cache
To eliminate the cache-miss latency on the *first* real request, send a **`max_tokens: 0`** request at startup (or on an interval). The API runs prefill — writing the cache at your `cache_control` breakpoint — and returns immediately with `content: []`, `stop_reason: "max_tokens"`, and a populated `usage` block (zero output tokens billed; normal cache-write charge on `cache_creation_input_tokens`).
**When to pre-warm** — pre-warming trades a cache-write charge *now* for lower TTFT on the *next* real request. It's worth it when all three hold: (a) first-request latency is user-visible (chat/voice/interactive — not background jobs), (b) the shared prefix is large enough that a cold write is noticeably slow, and (c) there's a moment *before* traffic to fire it — app startup, worker boot, post-deploy, start of a scheduled window.
| Skip pre-warming when… | Because |
|---|---|
| Traffic is continuous (requests ≤ TTL apart) | The first real request warms the cache and every subsequent one hits it; a separate warm call is a pure extra write |
| The prefix is small or below the cacheable minimum | The cold-write penalty is negligible |
| The prefix varies per request/user | Nothing shared to pre-warm |
| You'd pre-warm many distinct prefixes speculatively | Each is a ~1.25× write; cost can exceed the latency you save |
**Scheduled re-warms:** only needed when traffic has gaps longer than the TTL. If real requests arrive more often than every 5 minutes, they keep the cache warm on their own — don't add an interval re-warm. For bursty traffic with long idle gaps, either re-warm just under the TTL or switch to `ttl: "1h"` and re-warm less often.
```python
client.messages.create(
model="claude-opus-4-8",
max_tokens=0,
system=[{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"},
}],
messages=[{"role": "user", "content": "warmup"}],
)
```
**Breakpoint placement:** put `cache_control` on the **last block shared with the real request** (the system prompt or tool definitions) — **not** on the placeholder user message, and **not** via top-level automatic caching (which would key the cache to the placeholder). The placeholder can be any non-whitespace string; it's read during prefill but never answered.
**Rejected combinations:** `max_tokens: 0` is an `invalid_request_error` with `stream: true`, `thinking.type: "enabled"`, `output_config.format`, `tool_choice` of `{"type":"tool"}` or `{"type":"any"}`, or inside a Message Batches request.
**TTL still applies** — re-warm at least every 5 minutes for the default cache, or use the 1-hour TTL. This replaces the older `max_tokens: 1` workaround (no single-token reply to discard, no output tokens billed, intent is unambiguous).

View File

@@ -0,0 +1,56 @@
# Token Counting
Use the `count_tokens` endpoint (`POST /v1/messages/count_tokens`) for accurate
token counts against Claude models. Token counts are **model-specific** — pass
the same model ID you'll use for inference.
**Do not use `tiktoken`.** It's OpenAI's tokenizer. It undercounts Claude
tokens by ~1520% on typical text, and by much more on code or non-English
input. Any estimate from `tiktoken`, `gpt-tokenizer`, or similar is wrong for
Claude.
## Count a file or string
```python
from anthropic import Anthropic
client = Anthropic()
resp = client.messages.count_tokens(
model="claude-opus-4-8",
messages=[{"role": "user", "content": open("CLAUDE.md").read()}],
)
print(resp.input_tokens)
```
TypeScript: `await client.messages.countTokens({model, messages})`
`.input_tokens`. See `{lang}/claude-api/README.md` for other SDKs.
## CLI
```sh
ant messages count-tokens --model claude-opus-4-8 \
--message '{role: user, content: "@./CLAUDE.md"}' \
--transform input_tokens -r
```
## Diffing a file across two versions
The endpoint is stateless — count each version separately and subtract:
```python
from anthropic import Anthropic
import subprocess
client = Anthropic()
def count(text: str) -> int:
return client.messages.count_tokens(
model="claude-opus-4-8",
messages=[{"role": "user", "content": text}],
).input_tokens
before = subprocess.check_output(["git", "show", "HEAD:CLAUDE.md"], text=True)
after = open("CLAUDE.md").read()
print(count(after) - count(before))
```
Full docs: see the Token Counting entry in `shared/live-sources.md`.

View File

@@ -252,6 +252,26 @@ For full documentation, use WebFetch:
---
## Server-Side Tools: Advisor (Beta)
The advisor tool lets Claude consult a secondary model during a conversation. The advisor runs its own API call with a model you specify and returns its analysis to the primary model. Use it when you want a second opinion, specialized expertise, or cross-model verification without managing the orchestration yourself.
### Tool Definition
```json
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-sonnet-4-6"
}
```
The `model` parameter is required — it specifies which model the advisor uses for its own inference. Optional fields: `caching`, `max_uses`, `allowed_callers`, `defer_loading`, `strict`.
**Beta header required:** `advisor-tool-2026-03-01`. The SDK sets this automatically when using `client.beta.messages.create()` with advisor tools.
---
## Client-Side Tools: Memory
The memory tool enables Claude to store and retrieve information across conversations through a memory file directory. Claude can create, read, update, and delete files that persist between sessions.

View File

@@ -11,10 +11,12 @@ npm install @anthropic-ai/sdk
```typescript
import Anthropic from "@anthropic-ai/sdk";
// Default (uses ANTHROPIC_API_KEY env var)
// Default — resolves credentials from the environment:
// ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.
// Prefer this for local dev; don't hardcode a key.
const client = new Anthropic();
// Explicit API key
// Explicit API key (only when you must inject a specific key)
const client = new Anthropic({ apiKey: "your-api-key" });
```
@@ -51,6 +53,32 @@ const response = await client.messages.create({
});
```
### Mid-conversation system messages (beta, model-gated)
For operator instructions that arrive mid-conversation (mode switches, injected state), append `{role: "system", ...}` to `messages` instead of editing top-level `system` — this preserves the cached prefix and carries operator authority. Must follow a user message; cannot be `messages[0]`. Unsupported models return a 400 (`role 'system' is not supported on this model`). See `shared/prompt-caching.md` for when to use this vs. top-level `system`.
```typescript
// SDK types for role:"system" in messages are pending — pass the beta header
// directly until the SDK updates, then switch to client.beta.messages.create
// with betas: ["mid-conversation-system-2026-04-07"].
const response = await client.messages.create(
{
model: MODEL_ID, // must support mid-conversation system messages
max_tokens: 16000,
system: [
{ type: "text", text: STABLE_SYSTEM, cache_control: { type: "ephemeral" } },
],
messages: [
...history,
{ role: "user", content: userMessage },
// @ts-expect-error — role:"system" pending SDK types
{ role: "system", content: "Terse mode enabled — keep responses under 40 words." },
],
},
{ headers: { "anthropic-beta": "mid-conversation-system-2026-04-07" } },
);
```
---
## Vision (Images)
@@ -297,7 +325,18 @@ The `stop_reason` field in the response indicates why the model stopped generati
| `stop_sequence` | Hit a custom stop sequence |
| `tool_use` | Claude wants to call a tool — execute it and continue |
| `pause_turn` | Model paused and can be resumed (agentic flows) |
| `refusal` | Claude refused for safety reasons — output may not match schema |
| `refusal` | Claude refused for safety reasons — check `stop_details` |
### Structured Stop Details
When `stop_reason` is `"refusal"`, the response includes a `stop_details` object with structured information about the refusal:
```typescript
if (response.stop_reason === "refusal" && response.stop_details) {
console.log(`Category: ${response.stop_details.category}`); // "cyber" | "bio" | null
console.log(`Explanation: ${response.stop_details.explanation}`);
}
```
---

View File

@@ -15,10 +15,12 @@ npm install @anthropic-ai/sdk
```typescript
import Anthropic from "@anthropic-ai/sdk";
// Default (uses ANTHROPIC_API_KEY env var)
// Default — resolves credentials from the environment:
// ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.
// Prefer this for local dev; don't hardcode a key.
const client = new Anthropic();
// Explicit API key
// Explicit API key (only when you must inject a specific key)
const client = new Anthropic({ apiKey: "your-api-key" });
```
@@ -146,7 +148,7 @@ const [events] = await Promise.all([
]);
// Standalone stream iteration:
const stream = await client.beta.sessions.stream(
const stream = await client.beta.sessions.events.stream(
session.id,
);
@@ -161,7 +163,7 @@ for await (const event of stream) {
break;
case "agent.custom_tool_use":
// Custom tool invocation — session is now idle
console.log(`\nCustom tool call: ${event.tool_name}`);
console.log(`\nCustom tool call: ${event.name}`);
console.log(`Input: ${JSON.stringify(event.input)}`);
break;
case "session.status_idle":
@@ -221,11 +223,11 @@ function runCustomTool(toolName: string, toolInput: unknown): string {
async function runSession(client: Anthropic, sessionId: string) {
while (true) {
const stream = await client.beta.sessions.stream(
const stream = await client.beta.sessions.events.stream(
sessionId,
);
const toolCalls: Array<{ custom_tool_use_id: string; tool_name: string; input: unknown }> = [];
const toolCalls: Anthropic.Beta.Sessions.BetaManagedAgentsAgentCustomToolUseEvent[] = [];
for await (const event of stream) {
if (event.type === "agent.message") {
@@ -235,11 +237,7 @@ async function runSession(client: Anthropic, sessionId: string) {
}
}
} else if (event.type === "agent.custom_tool_use") {
toolCalls.push({
id: event.id,
tool_name: event.tool_name,
input: event.input,
});
toolCalls.push(event);
} else if (event.type === "session.status_idle") {
break;
} else if (event.type === "session.status_terminated") {
@@ -253,7 +251,7 @@ async function runSession(client: Anthropic, sessionId: string) {
const results = toolCalls.map((call) => ({
type: "user.custom_tool_result" as const,
custom_tool_use_id: call.id,
content: [{ type: "text" as const, text: runCustomTool(call.tool_name, call.input) }],
content: [{ type: "text" as const, text: runCustomTool(call.name, call.input) }],
}));
await client.beta.sessions.events.send(