A single-page reference of every Anthropic release, tool, and capability shipped Jan–May 2026 — organized by what to wire into Alex now, what to pilot, and what to park. Built for Ty's memory, not Anthropic's docs.
Snapshot of the live VPS state after tonight's upgrade pass. Use as the source of truth instead of memory.
Anthropic released Opus 4.8 on May 28. OpenClaw 2026.5.27 / 5.28-beta.1 (released the same day) haven't shipped the 4.8 catalog entry yet. Direct Anthropic API calls with claude-opus-4-8 work — but the OpenClaw gateway rejects the model with "Unknown model" because it's not in models.providers.anthropic.models[] with the proper schema.
Plan: Watch the npm registry. As soon as the next OpenClaw stable (likely 2026.5.29+) lands with the 4.8 catalog entry, run sudo openclaw update --yes and swap primary. Until then 4.7 is the right choice — same $5/$25 pricing, only marginal behavioral differences.
All currently-available Claude 4.x models, what they cost, and what we use each for.
| Model | Released | Input $/MTok | Output $/MTok | Context | Max Out | What we use it for |
|---|---|---|---|---|---|---|
| Opus 4.8 | May 28 2026 | $5 | $25 | 1M default | 128K | NEXT PRIMARY Pending OpenClaw catalog |
| Opus 4.7 | Apr 16 2026 | $5 | $25 | 1M | 128K | CURRENT PRIMARY Alex main conversations |
| Opus 4.6 | Feb 5 2026 | $5 | $25 | 1M | 128K | Configured but unused now (legacy) |
| Opus 4.5 | Nov 24 2025 | $5 | $25 | 200K | 64K | Not used |
| Sonnet 4.6 | Feb 17 2026 | $3 | $15 | 1M | 64K | CRON DEFAULT All 26 cron jobs + fallback #4 |
| Haiku 4.5 | Oct 15 2025 | $1 | $5 | 200K | 32K | HEARTBEAT Discord-notify + cheap subtasks |
May 28 release. Same $5/$25 pricing as 4.7. The differences are behavioral + small new APIs.
Anthropic measured 4× reduction in cases where Claude wrote code with a flaw and reported success. Better self-checking, more willingness to say "I'm not sure," more pushback on bad plans.
You can now insert role: "system" entries at any position in the messages array. Preserves prompt cache hits when instructions change mid-session.
// Inject a new instruction mid-conversation messages: [ { role: "user", ... }, { role: "assistant", ... }, { role: "system", content: "From now on, use formal voice" }, { role: "user", ... } ]
When Claude declines, response includes a stop_details.refusal_category. Lets you route different refusal classes (safety, capability, ambiguity) to different next steps.
On 4.8 the effort param defaults to high across all surfaces. Levels: low · medium · high · xhigh · max.
output_config: { effort: "xhigh" // for coding + agentic }
low / medium. Heavy intel + coding crons → high / xhigh. Direct cost lever.Only triggers reasoning when the model judges it's needed. Replaces the old thinking: {type: "enabled", budget_tokens: N} pattern which 400s on 4.7+.
thinking: { type: "adaptive" } // or omit entirely — off by default on 4.7+
type: "enabled" — that breaks on 4.7+. The migration was probably done in the 4.6→4.7 jump but worth a sanity check.2.5× output speed at 2× standard pricing ($10/$50 per MTok). Down from prior 6× premium. Opus 4.6 fast mode deprecated — removal 30 days after launch (~June 27).
Big new feature in Claude Code Max/Team/Enterprise. Not available via the raw API. Means our path is partial.
Claude writes a JavaScript orchestration script for your task, runtime executes it, fans work across up to 16 concurrent + 1000 total subagents, with adversarial review pairs per file.
Activate: include word workflow in prompt, enable ultracode setting, or run /deep-research.
The pattern is reproducible without Anthropic's hosted workflows: spawn N sub-sessions via sessions_spawn, each on a slice of the task, with a reviewer agent comparing pairs.
OpenClaw's cron.maxConcurrent defaulted to 4 (raised to 8 in 5.26). Subagents have maxConcurrent: 8 already.
All these went GA between Nov 2025 and Feb 2026. Most are no-beta-header now. Free wins if not already used.
Pair a fast executor (Haiku) with a high-intelligence advisor (Opus). Most token spend at executor rates; advisor injects strategic guidance mid-generation.
community-intel-scan. Haiku 4.5 executor + Opus 4.7 advisor. Compare cost vs all-Opus baseline over 1 run.Server-side memory: Claude stores and retrieves info across conversations. Anthropic handles persistence + retrieval. No infrastructure needed on our side.
Claude dynamically discovers + loads tools from a large catalog on demand. Solves the "30+ tools = giant system prompt" problem.
openclaw plugins list.Bash + file manipulation + write code in any language. Replaces the original Python-only code-exec. FREE when used with web search or web fetch.
code_execution not the old python.Native web search tool returns rich SEC filing data (May 18), supports dynamic filtering via code-exec (Feb 17). Memory notes Alex uses Brave LLM-context mode.
tools.web.search.brave.mode: "llm-context" is enabled. Verify still set after upgrade.Claude calls tools from inside code execution. Reduces latency + token usage in multi-tool workflows.
code_execution tool-runner can be chained without surfacing every call to the conversation.Guaranteed JSON schema conformance. Strict tool use validation. Param moved to output_config.format.
Retrieve full content from web pages + PDFs. Different from web search (which returns snippets).
Caching is up to 90% cost reduction + 80% latency reduction when it hits. When it misses you pay full price and don't notice. Anthropic shipped tools to fix that.
Single cache_control field + system caches the last cacheable block and moves it forward as conversation grows. No manual placement of cache breakpoints.
cache_control: { type: "ephemeral" }
Pass diagnostics.previous_message_id on a request — API returns cache_miss_reason explaining where the cache prefix diverged.
cache_miss_reason to alex-cache-debug.log. Review top 5 causes. Often: skill order changes, MEMORY.md edits, AGENTS.md drift.In addition to 5-minute default, set 1-hour cache TTL. Better for long-running cron jobs that fire multiple times per hour.
cache_control: { type: "ephemeral", ttl: "1h" }
Smaller prompts cache successfully on Opus 4.8 than 4.7. Means even short cron jobs benefit from caching.
Anthropic's hosted agent runtime. We already have OpenClaw; this section is to know what's there in case we ever migrate.
3 new endpoint groups: /v1/agents (definition), /v1/sessions (run), /v1/environments (sandbox). Built-in tools, sandboxing, SSE streaming.
Define outcomes for an agent + spawn sub-agents inside Managed Agents. Their version of the Workflows pattern but server-side.
Anthropic-hosted memory layer for Managed Agents. Cross-conversation state.
Run Managed Agent tool execution in YOUR infrastructure instead of Anthropic's. Useful for compliance but only if you've already adopted Managed Agents.
Expose your private MCP servers to Anthropic-hosted agents via secure tunnel. Only relevant if Anthropic agents need to reach our private MCPs.
Session + vault lifecycle event webhooks.
Every Anthropic release Jan–May 2026 in chronological order. Use as a quick scan to find anything you missed.
Webhooks, multiagent orchestration, self-hosted sandboxes now available on Claude Platform on AWS.
Mid-conversation system messages, refusal stop_details, effort=high default, adaptive thinking optimizations, fast mode at 3× lower price. Opus 4.6 fast mode deprecated (~June 27 removal). Workflows research preview in Claude Code (Max/Team/Enterprise).
Connect to private MCPs from Anthropic-hosted agents. Self-hosted sandboxes for tool execution. Large outputs auto-spill to file (100K+ tokens).
Richer financial data sourcing.
API returns cache_miss_reason. Header: cache-diagnosis-2026-04-07.
Same speed/price model as 4.6 fast mode.
Anthropic-managed infra via AWS endpoints with AWS billing + IAM.
Hosted multi-agent orchestration. Webhooks. Vault background refresh.
Move to Sonnet 4.6 or Opus 4.6+ for 1M (now GA at standard pricing).
Migrate to Haiku 4.5.
New tokenizer (1-1.35× more tokens per char), prefilling removed, sampling params removed, manual extended thinking removed (use adaptive), high-res image support (2576px), more direct tone, fewer subagents/tool calls by default.
Retire June 15 2026.
Executor + advisor pairing for cost-efficient long-horizon work. Header: advisor-tool-2026-03-01.
Auto-rolling cache breakpoints. Sonnet 3.7 + Haiku 3.5 retired.
Code execution FREE with web search/fetch. Web search GA. Tool search tool GA. Memory tool GA. Programmatic tool calling GA.
Adaptive thinking default. Manual thinking deprecated. No prefill on Opus 4.6+. 1M context beta.
JSON schema conformance + strict tool use. output_format → output_config.format.
If you remember nothing else about this document, remember this page.
cache-diagnosis-2026-04-07. Pass diagnostics.previous_message_id. Log cache_miss_reason for a week. Cheapest insight into cost leaks.
low/medium on cheap cron jobs (heartbeats, status), high/xhigh on heavy work. 30-50% cost reduction possible on the cheap end.
cache_control: {type: "ephemeral", ttl: "1h"}. Cron jobs that fire multiple times an hour stop paying for re-caching.
openclaw update status. As soon as 4.8 appears in openclaw models list --provider anthropic with full metadata, run model swap.
temperature, top_p, top_k at non-default values return 400. Omit them. Prompting is the way to steer.
thinking: {type: "enabled", budget_tokens: N} returns 400. Use type: "adaptive".
max_tokens with 35% headroom.
thinking.display: "summarized" if you want visible reasoning during streaming.