Introducing Claude Sonnet 4.5

Post Title Image (Photo by Brett Jordan on Unsplash)

✳️ tl;dr

  • Claude Code introduces Checkpoints feature, enabling real-time progress saving and rollback to previous states, solving pain points in long-term development 12
  • Native VS Code extension released, supporting real-time inline diffs display of Claude Code’s code changes
  • Claude Sonnet 4.5 achieves 77.2% on SWE-bench Verified (full 500 problems), using simple bash and file editing tools 13

  • High-compute configuration with parallel testing reaches 82.0%, using rejection sampling and internal scoring models to select best candidates 1
  • API adds memory tools and automatic context management, allowing agents to maintain context across long-running tasks 14
  • Pricing remains the same as Claude Sonnet 4: $3 per million input tokens, $15 per million output tokens, with prompt caching saving up to 90% costs 5

  • Cursor CEO states Claude Sonnet 4.5 demonstrates state-of-the-art programming performance on long-horizon tasks, making it the top choice for developers solving complex problems 1

  • Achieves 61.4% on OSWorld benchmark, leading all competitors; Sonnet 4 was only 42.2% four months ago
  • Implements AI Safety Level 3 (ASL-3) protection framework, CBRN threat detection false positive rate reduced tenfold compared to initial description
  • Makes progress in prompt injection attack defense, one of the most serious security risks facing agentic AI systems
  • First inclusion of “Interpretability” technical assessment in System Card, improving model transparency and credibility 6

7

✳️ Knowledge Graph

(More about Knowledge Graph…)

%%{init: {'theme':'default'}}%%
graph LR
    A[Foundation Model]:::concept --> B[Claude Sonnet 4.5]:::instance
    B --> C[Agentic AI System]:::concept
    B --> D[Code Generation]:::concept
    B --> E[Computer Use]:::concept

    C --> F[Long-Horizon Tasks]:::concept
    F --> G[Memory Management]:::concept
    F --> H[Context Window Management]:::concept

    B --> I[Claude Agent SDK]:::instance
    I --> J[Tool Use]:::concept
    I --> K[Orchestration]:::concept
    I --> L[Permission System]:::concept

    J --> E
    E --> L

    I --> M[Claude Code]:::instance
    M --> N[Checkpoints]:::concept
    M --> O[VS Code Extension]:::instance
    N --> P[Rollback]:::concept

    B --> Q[Extended Thinking]:::concept
    Q --> R[Reasoning]:::concept
    Q --> S[Test-Time Compute]:::concept

    B --> T[Model Alignment]:::concept
    T --> U[Reduces Sycophancy]:::concept
    T --> V[Reduces Deception]:::concept
    T --> W[Reduces Power-Seeking]:::concept

    B --> X[AI Safety Level 3]:::instance
    X --> Y[Safety Classifiers]:::concept
    Y --> Z[CBRN Detection]:::concept

    AA[Prompt Injection]:::concept --> |threatens| C
    B --> |defends against| AA

    AB[Mechanistic Interpretability]:::concept --> |evaluates| T

    B --> AC[Benchmark Evaluation]:::concept
    AC --> AD[SWE-bench Verified]:::instance
    AC --> AE[OSWorld]:::instance
    AD --> |measures| D
    AE --> |measures| E

    C --> AF[Multi-Agent System]:::concept
    AF --> K

    AG[Human-in-the-Loop]:::concept --> |controls| C

    I --> AH[API Integration]:::concept
    AH --> AI[Amazon Bedrock]:::instance
    AH --> AJ[Google Cloud Vertex AI]:::instance

    classDef concept fill:#FF8000,stroke:#333,stroke-width:2px,color:#000
    classDef instance fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff

✳️ Further Reading