(Photo by Brett Jordan on Unsplash)
✳️ tl;dr
- Claude Code introduces Checkpoints feature, enabling real-time progress saving and rollback to previous states, solving pain points in long-term development 12
- Native VS Code extension released, supporting real-time inline diffs display of Claude Code’s code changes
- Claude Sonnet 4.5 achieves 77.2% on SWE-bench Verified (full 500 problems), using simple bash and file editing tools 13
- High-compute configuration with parallel testing reaches 82.0%, using rejection sampling and internal scoring models to select best candidates 1
- API adds memory tools and automatic context management, allowing agents to maintain context across long-running tasks 14
- Pricing remains the same as Claude Sonnet 4: $3 per million input tokens, $15 per million output tokens, with prompt caching saving up to 90% costs 5
- Cursor CEO states Claude Sonnet 4.5 demonstrates state-of-the-art programming performance on long-horizon tasks, making it the top choice for developers solving complex problems 1
- Achieves 61.4% on OSWorld benchmark, leading all competitors; Sonnet 4 was only 42.2% four months ago
- Implements AI Safety Level 3 (ASL-3) protection framework, CBRN threat detection false positive rate reduced tenfold compared to initial description
- Makes progress in prompt injection attack defense, one of the most serious security risks facing agentic AI systems
- First inclusion of “Interpretability” technical assessment in System Card, improving model transparency and credibility 6
✳️ Knowledge Graph
(More about Knowledge Graph…)
✳️ Further Reading
SWE-bench/SWE-bench: SWE-bench: Can Language Models Resolve Real-world Github Issues? ↩︎
Introducing Claude Sonnet 4.5 in Amazon Bedrock: Anthropic’s most intelligent model, best for coding and complex agents | AWS News Blog ↩︎
Anthropic launches Claude Sonnet 4.5, its best AI model for coding | TechCrunch ↩︎