Introducing Claude Sonnet 4.5

Published: 2025-09-30

Lastmod: 2025-10-01

Random Notes

by Ernest Chiang

(Photo by Brett Jordan on Unsplash)

✳️ tl;dr

Claude Code introduces Checkpoints feature, enabling real-time progress saving and rollback to previous states, solving pain points in long-term development ¹²
Native VS Code extension released, supporting real-time inline diffs display of Claude Code’s code changes
Claude Sonnet 4.5 achieves 77.2% on SWE-bench Verified (full 500 problems), using simple bash and file editing tools ¹³

High-compute configuration with parallel testing reaches 82.0%, using rejection sampling and internal scoring models to select best candidates ¹
API adds memory tools and automatic context management, allowing agents to maintain context across long-running tasks ¹⁴
Pricing remains the same as Claude Sonnet 4: $3 per million input tokens, $15 per million output tokens, with prompt caching saving up to 90% costs ⁵

Cursor CEO states Claude Sonnet 4.5 demonstrates state-of-the-art programming performance on long-horizon tasks, making it the top choice for developers solving complex problems ¹

Achieves 61.4% on OSWorld benchmark, leading all competitors; Sonnet 4 was only 42.2% four months ago
Implements AI Safety Level 3 (ASL-3) protection framework, CBRN threat detection false positive rate reduced tenfold compared to initial description
Makes progress in prompt injection attack defense, one of the most serious security risks facing agentic AI systems
First inclusion of “Interpretability” technical assessment in System Card, improving model transparency and credibility ⁶

⁷

✳️ Knowledge Graph

(More about Knowledge Graph…)

%%{init: {'theme':'default'}}%%
graph LR
    A[Foundation Model]:::concept --> B[Claude Sonnet 4.5]:::instance
    B --> C[Agentic AI System]:::concept
    B --> D[Code Generation]:::concept
    B --> E[Computer Use]:::concept

    C --> F[Long-Horizon Tasks]:::concept
    F --> G[Memory Management]:::concept
    F --> H[Context Window Management]:::concept

    B --> I[Claude Agent SDK]:::instance
    I --> J[Tool Use]:::concept
    I --> K[Orchestration]:::concept
    I --> L[Permission System]:::concept

    J --> E
    E --> L

    I --> M[Claude Code]:::instance
    M --> N[Checkpoints]:::concept
    M --> O[VS Code Extension]:::instance
    N --> P[Rollback]:::concept

    B --> Q[Extended Thinking]:::concept
    Q --> R[Reasoning]:::concept
    Q --> S[Test-Time Compute]:::concept

    B --> T[Model Alignment]:::concept
    T --> U[Reduces Sycophancy]:::concept
    T --> V[Reduces Deception]:::concept
    T --> W[Reduces Power-Seeking]:::concept

    B --> X[AI Safety Level 3]:::instance
    X --> Y[Safety Classifiers]:::concept
    Y --> Z[CBRN Detection]:::concept

    AA[Prompt Injection]:::concept --> |threatens| C
    B --> |defends against| AA

    AB[Mechanistic Interpretability]:::concept --> |evaluates| T

    B --> AC[Benchmark Evaluation]:::concept
    AC --> AD[SWE-bench Verified]:::instance
    AC --> AE[OSWorld]:::instance
    AD --> |measures| D
    AE --> |measures| E

    C --> AF[Multi-Agent System]:::concept
    AF --> K

    AG[Human-in-the-Loop]:::concept --> |controls| C

    I --> AH[API Integration]:::concept
    AH --> AI[Amazon Bedrock]:::instance
    AH --> AJ[Google Cloud Vertex AI]:::instance

    classDef concept fill:#FF8000,stroke:#333,stroke-width:2px,color:#000
    classDef instance fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff

Introducing Claude Sonnet 4.5

✳️ tl;dr

✳️ Knowledge Graph

✳️ Further Reading

Contents