Anthropic Features Reference — May 2026 (for Alex)

00Where Alex stands today

Snapshot of the live VPS state after tonight's upgrade pass. Use as the source of truth instead of memory.

Live state · 2026-05-29 14:25 UTC

OpenClaw version: 2026.5.28-beta.1 (84fd629)
Primary model: anthropic/claude-opus-4-7
Fallback chain: grok-4.20 → gemini-3.1-pro → gpt-5.2-codex → sonnet-4.6
Auth profile order: anthropic:fallback (api_key) → anthropic:default (api_key)
Cron model (26 jobs): claude-sonnet-4-6
Compaction model: gemini-2.5-flash
Discord-notify model: claude-haiku-4-5-20251001
Gateway processes: 1 (root user-systemd, alexops unit permanently disabled)
Lossless-claw: v0.10.0 npm package (old extension path retired)
Preflight: 10/10 checks passing
Heartbeat: 2h interval
Session store: /root/.openclaw/agents/main/sessions/sessions.json (213 entries)

⚠️ Why Opus 4.8 isn't primary yet

Anthropic released Opus 4.8 on May 28. OpenClaw 2026.5.27 / 5.28-beta.1 (released the same day) haven't shipped the 4.8 catalog entry yet. Direct Anthropic API calls with claude-opus-4-8 work — but the OpenClaw gateway rejects the model with "Unknown model" because it's not in models.providers.anthropic.models[] with the proper schema.

Plan: Watch the npm registry. As soon as the next OpenClaw stable (likely 2026.5.29+) lands with the 4.8 catalog entry, run sudo openclaw update --yes and swap primary. Until then 4.7 is the right choice — same $5/$25 pricing, only marginal behavioral differences.

01The Claude 4.x model lineup

All currently-available Claude 4.x models, what they cost, and what we use each for.

Model	Released	Input $/MTok	Output $/MTok	Context	Max Out	What we use it for
Opus 4.8	May 28 2026	$5	$25	1M default	128K	NEXT PRIMARY Pending OpenClaw catalog
Opus 4.7	Apr 16 2026	$5	$25	1M	128K	CURRENT PRIMARY Alex main conversations
Opus 4.6	Feb 5 2026	$5	$25	1M	128K	Configured but unused now (legacy)
Opus 4.5	Nov 24 2025	$5	$25	200K	64K	Not used
Sonnet 4.6	Feb 17 2026	$3	$15	1M	64K	CRON DEFAULT All 26 cron jobs + fallback #4
Haiku 4.5	Oct 15 2025	$1	$5	200K	32K	HEARTBEAT Discord-notify + cheap subtasks

🧠 Mental model — when to pick which

Opus 4.7 / 4.8 Hard reasoning, agentic work that needs to chain 30+ steps, anything that touches money or client-facing output, code that has to pass tests. The $25/MTok price is real but the work usually finishes in one pass.

Sonnet 4.6 Default for cron jobs. Anything where the agent has structured output, follows a template, or makes the same kind of decision repeatedly. 5x cheaper than Opus output, often as good for repetitive work.

Haiku 4.5 Heartbeats, status updates, quick classifications, simple "did X happen" checks. Anything where speed matters more than depth. 25x cheaper on output than Opus.

Cost rule of thumb If you're spending more than $0.10 per agent turn, you're using the wrong model. Tier down aggressively for any task that isn't gated on "the right answer."

02What's actually different in Opus 4.8

May 28 release. Same $5/$25 pricing as 4.7. The differences are behavioral + small new APIs.

4× fewer "code passes review with bugs"

WIRE WHEN AVAILABLE

Capability · honesty about progress

Anthropic measured 4× reduction in cases where Claude wrote code with a flaw and reported success. Better self-checking, more willingness to say "I'm not sure," more pushback on bad plans.

Why for Alex: Cron jobs that report success on broken runs are the most expensive failure mode. 4× reduction here is a direct cost lever.

Mid-conversation system messages

WIRE WHEN AVAILABLE

API · 4.8 only · no beta header

You can now insert role: "system" entries at any position in the messages array. Preserves prompt cache hits when instructions change mid-session.

// Inject a new instruction mid-conversation
messages: [
  { role: "user", ... },
  { role: "assistant", ... },
  { role: "system", content: "From now on, use formal voice" },
  { role: "user", ... }
]

How for Alex: Long Discrawl live tail sessions or multi-day client work — push policy updates without breaking cache.

Refusal stop_details

WIRE WHEN AVAILABLE

API · 4.8 only · no beta header

When Claude declines, response includes a stop_details.refusal_category. Lets you route different refusal classes (safety, capability, ambiguity) to different next steps.

How for Alex: If a cron job hits a refusal, you can branch: retry with different prompt vs notify Ty vs skip. Today, all refusals look the same.

Effort = high default

WIRE NOW (works on 4.7 too)

Parameter · already GA on 4.6+

On 4.8 the effort param defaults to high across all surfaces. Levels: low · medium · high · xhigh · max.

output_config: {
  effort: "xhigh"  // for coding + agentic
}

How for Alex: Set cron jobs that just summarize → low / medium. Heavy intel + coding crons → high / xhigh. Direct cost lever.

Adaptive thinking

VERIFY WIRED

API · GA on 4.6+ · default

Only triggers reasoning when the model judges it's needed. Replaces the old thinking: {type: "enabled", budget_tokens: N} pattern which 400s on 4.7+.

thinking: { type: "adaptive" }
// or omit entirely — off by default on 4.7+

Why for Alex: Verify no skill/agent config pins type: "enabled" — that breaks on 4.7+. The migration was probably done in the 4.6→4.7 jump but worth a sanity check.

Fast Mode 3× cheaper

PARKED — we don't use fast mode

Research preview · API only

2.5× output speed at 2× standard pricing ($10/$50 per MTok). Down from prior 6× premium. Opus 4.6 fast mode deprecated — removal 30 days after launch (~June 27).

Why park: Alex doesn't use fast mode anywhere. Non-issue unless we want to add it for latency-sensitive interactive sessions.

03Dynamic Workflows — what we can copy, what's blocked

Big new feature in Claude Code Max/Team/Enterprise. Not available via the raw API. Means our path is partial.

Workflows in Claude Code

CC ONLY · NOT FOR ALEX

Research preview · Max/Team/Enterprise · v2.1.154+

Claude writes a JavaScript orchestration script for your task, runtime executes it, fans work across up to 16 concurrent + 1000 total subagents, with adversarial review pairs per file.

Activate: include word workflow in prompt, enable ultracode setting, or run /deep-research.

Reference case: Bun rewrite from Zig → Rust. 750K lines of Rust, 99.8% test pass, 11 days, hundreds of parallel agents.

For Ty's MacBook: Turn on. Use for codebase-scale work like the Reputation Shield SaaS build whenever we kick that off.

Workflow pattern for Alex

PILOT IN OPENCLAW

Our gateway · DIY

The pattern is reproducible without Anthropic's hosted workflows: spawn N sub-sessions via sessions_spawn, each on a slice of the task, with a reviewer agent comparing pairs.

OpenClaw's cron.maxConcurrent defaulted to 4 (raised to 8 in 5.26). Subagents have maxConcurrent: 8 already.

How for Alex: Try on the next research-style cron (community-intel-scan, dashboard-playbook updates). Sketch: research agent → 5 parallel writers → reviewer → merge.

04Tools that became GA — wire these

All these went GA between Nov 2025 and Feb 2026. Most are no-beta-header now. Free wins if not already used.

Advisor tool

PILOT FOR COST

Public beta · header advisor-tool-2026-03-01

Pair a fast executor (Haiku) with a high-intelligence advisor (Opus). Most token spend at executor rates; advisor injects strategic guidance mid-generation.

Why for Alex: Memory notes API key billing ~$50/day. This is the single biggest cost lever available — could 40-60% on cron jobs that don't actually need Opus-quality token-by-token.

How: Pilot on community-intel-scan. Haiku 4.5 executor + Opus 4.7 advisor. Compare cost vs all-Opus baseline over 1 run.

Memory tool (GA Feb 17)

CONFLICTS WITH LOSSLESS-CLAW

No beta header

Server-side memory: Claude stores and retrieves info across conversations. Anthropic handles persistence + retrieval. No infrastructure needed on our side.

Conflict: Alex already runs lossless-claw v0.10.0 for cross-session memory + QMD for vector search. Adding Anthropic's memory tool on top would duplicate state and cost.

How: Don't enable on primary. Could test on one isolated skill (copywriter-agent) to compare quality vs lossless-claw.

Tool search tool (GA Feb 17)

QUIET WIN

No beta header

Claude dynamically discovers + loads tools from a large catalog on demand. Solves the "30+ tools = giant system prompt" problem.

Why for Alex: Alex has dozens of skills + MCP tools. Loading them all into system prompt eats tokens. Tool search lets her pull only what she needs per turn.

How: Likely already wired in OpenClaw 5.26+ ("tool-search catalogs do less hot-path rediscovery"). Verify via openclaw plugins list.

Code execution tool v2

UPGRADE IF USING V1

No beta header · GA

Bash + file manipulation + write code in any language. Replaces the original Python-only code-exec. FREE when used with web search or web fetch.

How: Already on v2 in newer SDKs. Confirm Alex's gateway is using the new tool name code_execution not the old python.

Web search GA + Brave LLM-context

ALREADY ON

No beta header · GA Feb 17

Native web search tool returns rich SEC filing data (May 18), supports dynamic filtering via code-exec (Feb 17). Memory notes Alex uses Brave LLM-context mode.

State: Memory says tools.web.search.brave.mode: "llm-context" is enabled. Verify still set after upgrade.

Programmatic tool calling (GA Feb 17)

QUIET WIN

No beta header

Claude calls tools from inside code execution. Reduces latency + token usage in multi-tool workflows.

How: Built into newer SDKs. Tools listed in code_execution tool-runner can be chained without surfacing every call to the conversation.

Structured outputs (GA Jan 29)

CONSIDER FOR CRONS

No beta header · Opus 4.5/4.7 + Haiku 4.5

Guaranteed JSON schema conformance. Strict tool use validation. Param moved to output_config.format.

How: Cron jobs that should write structured ClickUp tasks or dashboard rows — switch to structured outputs to eliminate format-parsing errors.

Web fetch tool

ALREADY ON

Public beta

Retrieve full content from web pages + PDFs. Different from web search (which returns snippets).

State: Alex uses for KB sync, intel ingestion. No action needed.

05Prompt caching — the cost lever you're probably leaking

Caching is up to 90% cost reduction + 80% latency reduction when it hits. When it misses you pay full price and don't notice. Anthropic shipped tools to fix that.

Automatic caching (Feb 19)

WIRE

No manual breakpoints needed

Single cache_control field + system caches the last cacheable block and moves it forward as conversation grows. No manual placement of cache breakpoints.

cache_control: { type: "ephemeral" }

Why for Alex: Long sessions accumulate cache. Without auto-caching, you have to update cache breakpoint placement every time the prompt grows. Auto-caching removes that maintenance.

Cache diagnostics (May 13)

PILOT 1 WEEK

Beta header cache-diagnosis-2026-04-07

Pass diagnostics.previous_message_id on a request — API returns cache_miss_reason explaining where the cache prefix diverged.

Why for Alex: Cache misses on Opus 4.7 cost full $5/MTok input. If 30% of Alex's sessions are missing cache she didn't realize she'd miss, this finds it.

How: Enable for one week on the gateway's anthropic transport. Log cache_miss_reason to alex-cache-debug.log. Review top 5 causes. Often: skill order changes, MEMORY.md edits, AGENTS.md drift.

1-hour cache duration

ALREADY GA

GA Aug 13 2025 · no beta header

In addition to 5-minute default, set 1-hour cache TTL. Better for long-running cron jobs that fire multiple times per hour.

cache_control: { type: "ephemeral", ttl: "1h" }

How for Alex: Set on stable startup prompts that don't change between cron runs. The 1h TTL means a job that fires every 30min hits cache the second time.

Min cacheable prompt lowered

4.8 BENEFIT

Opus 4.8 · 1024 tokens (down from 4.7)

Smaller prompts cache successfully on Opus 4.8 than 4.7. Means even short cron jobs benefit from caching.

Effect: When Alex moves to 4.8, more of her workloads will hit the cache automatically.

06Claude Managed Agents — Anthropic's OpenClaw competitor

Anthropic's hosted agent runtime. We already have OpenClaw; this section is to know what's there in case we ever migrate.

Managed Agents runtime

PARK — we run OpenClaw

Public beta Apr 8 · header managed-agents-2026-04-01

3 new endpoint groups: /v1/agents (definition), /v1/sessions (run), /v1/environments (sandbox). Built-in tools, sandboxing, SSE streaming.

Why park: Migrating Alex from OpenClaw → Managed Agents is a 60-90 day rebuild. Same functional surface for marginal benefit. Track but don't migrate.

Multiagent sessions + Outcomes

PARK

Public beta May 6

Define outcomes for an agent + spawn sub-agents inside Managed Agents. Their version of the Workflows pattern but server-side.

Memory for Managed Agents

PARK · lossless-claw covers

Public beta Apr 23

Anthropic-hosted memory layer for Managed Agents. Cross-conversation state.

Self-hosted sandboxes

PARK

Public beta May 19

Run Managed Agent tool execution in YOUR infrastructure instead of Anthropic's. Useful for compliance but only if you've already adopted Managed Agents.

MCP tunnels

PARK · we run our own gateway

Research preview May 19

Expose your private MCP servers to Anthropic-hosted agents via secure tunnel. Only relevant if Anthropic agents need to reach our private MCPs.

Webhooks for Managed Agents

PARK

May 6 (CA) / May 29 (AWS)

Session + vault lifecycle event webhooks.

07Five-month timeline · all the releases that matter

Every Anthropic release Jan–May 2026 in chronological order. Use as a quick scan to find anything you missed.

MAY 29 2026

Managed Agents on AWS expanded

Webhooks, multiagent orchestration, self-hosted sandboxes now available on Claude Platform on AWS.

MAY 28 2026

🚀 Opus 4.8 released

Mid-conversation system messages, refusal stop_details, effort=high default, adaptive thinking optimizations, fast mode at 3× lower price. Opus 4.6 fast mode deprecated (~June 27 removal). Workflows research preview in Claude Code (Max/Team/Enterprise).

MAY 19 2026

MCP Tunnels + Managed Agents self-hosted sandboxes

Connect to private MCPs from Anthropic-hosted agents. Self-hosted sandboxes for tool execution. Large outputs auto-spill to file (100K+ tokens).

MAY 18 2026

Web search adds SEC filings

Richer financial data sourcing.

MAY 13 2026

Cache diagnostics public beta

API returns cache_miss_reason. Header: cache-diagnosis-2026-04-07.

MAY 12 2026

Fast mode on Opus 4.7

Same speed/price model as 4.6 fast mode.

MAY 11 2026

Claude Platform on AWS

Anthropic-managed infra via AWS endpoints with AWS billing + IAM.

MAY 6 2026

Multiagent sessions + Outcomes (Managed Agents)

Hosted multi-agent orchestration. Webhooks. Vault background refresh.

APR 30 2026

1M context beta retired on Sonnet 4 / 4.5

Move to Sonnet 4.6 or Opus 4.6+ for 1M (now GA at standard pricing).

APR 23 2026

Memory tool for Managed Agents (beta)

APR 20 2026

Haiku 3 retired

Migrate to Haiku 4.5.

APR 16 2026

🚀 Opus 4.7 released

New tokenizer (1-1.35× more tokens per char), prefilling removed, sampling params removed, manual extended thinking removed (use adaptive), high-res image support (2576px), more direct tone, fewer subagents/tool calls by default.

APR 14 2026

Sonnet 4 + Opus 4 deprecation announced

Retire June 15 2026.

APR 9 2026

🛠 Advisor tool launched

Executor + advisor pairing for cost-efficient long-horizon work. Header: advisor-tool-2026-03-01.

APR 8 2026

🛠 Claude Managed Agents + ant CLI launched

FEB 19 2026

Automatic caching launched

Auto-rolling cache breakpoints. Sonnet 3.7 + Haiku 3.5 retired.

FEB 17 2026

🚀 Sonnet 4.6 released

Code execution FREE with web search/fetch. Web search GA. Tool search tool GA. Memory tool GA. Programmatic tool calling GA.

FEB 5 2026

🚀 Opus 4.6 + Compaction API + Data Residency

Adaptive thinking default. Manual thinking deprecated. No prefill on Opus 4.6+. 1M context beta.

JAN 29 2026

Structured outputs GA

JSON schema conformance + strict tool use. output_format → output_config.format.

08The one-screen cheat sheet

If you remember nothing else about this document, remember this page.

🎯 Wire these into Alex first

Cache diagnostics Header cache-diagnosis-2026-04-07. Pass diagnostics.previous_message_id. Log cache_miss_reason for a week. Cheapest insight into cost leaks.

Effort tuning per cron Set low/medium on cheap cron jobs (heartbeats, status), high/xhigh on heavy work. 30-50% cost reduction possible on the cheap end.

Advisor tool pilot Pair Haiku 4.5 executor + Opus 4.7 advisor on community-intel-scan. Measure cost delta for 1 week. If >40% savings with comparable output, expand to other heavy crons.

1-hour cache TTL on cron startup prompts cache_control: {type: "ephemeral", ttl: "1h"}. Cron jobs that fire multiple times an hour stop paying for re-caching.

Opus 4.8 swap Pending OpenClaw catalog support. Watch openclaw update status. As soon as 4.8 appears in openclaw models list --provider anthropic with full metadata, run model swap.

🚫 Park these — don't get distracted

Claude Managed Agents It's Anthropic's OpenClaw. We have OpenClaw. Migrating = 60-90 day rebuild for marginal benefit. Track but don't move.

Memory tool (Anthropic-hosted) Lossless-claw + QMD already do this for Alex. Adding the Anthropic version would duplicate state and cost.

MCP tunnels For exposing private MCPs to Anthropic-hosted agents. We don't use Anthropic-hosted agents.

Claude Platform on AWS We're not on AWS. Pricing/billing convenience for AWS-native shops.

⚡ Breaking changes to KNOW (not learn)

Sampling params blocked on Opus 4.7+ temperature, top_p, top_k at non-default values return 400. Omit them. Prompting is the way to steer.

Manual extended thinking blocked on Opus 4.7+ thinking: {type: "enabled", budget_tokens: N} returns 400. Use type: "adaptive".

Prefill blocked on Opus 4.6+ Can't prefill assistant messages. Use structured outputs or system instructions instead.

Opus 4.7+ uses more tokens per char New tokenizer = 1-1.35× more tokens for same text. Budget max_tokens with 35% headroom.

Thinking content omitted by default on 4.7+ Set thinking.display: "summarized" if you want visible reasoning during streaming.

Sonnet 4 + Opus 4 retire June 15 2026 Already migrated. Don't reintroduce old model IDs in any config.