Claude Sonnet 4.5 問世

Post Title Image (Photo by Brett Jordan on Unsplash)

✳️ tl;dr

  • Claude Code 新增 Checkpoints 功能,可即時儲存進度並回滾到先前狀態,解決長時程開發的痛點 12
  • 原生 VS Code 擴展發布,支援即時 inline diffs 顯示 Claude Code 的程式碼變更
  • Claude Sonnet 4.5 在 SWE-bench Verified 達到 77.2%(完整 500 題),採用簡單的 bash 和檔案編輯工具 13

  • 高算力配置下透過平行測試時計算可達 82.0%,使用拒絕採樣和內部評分模型選擇最佳候選 1
  • API 新增記憶工具和自動上下文管理功能,讓代理能夠在長時間運行任務中維持上下文 14
  • 定價維持與 Claude Sonnet 4 相同:輸入 $3/百萬 token、輸出 $15/百萬 token,搭配 prompt caching 可節省高達 90% 成本 5

  • Cursor CEO 表示 Claude Sonnet 4.5 在長時程任務上展現最先進的程式開發性能,是開發者解決複雜問題的首選 1

  • 在 OSWorld 基準測試達到 61.4%,領先所有競爭對手,四個月前 Sonnet 4 僅為 42.2%
  • 實施 AI Safety Level 3 (ASL-3) 保護框架,CBRN 威脅檢測誤判率相較最初描述時已降低十倍
  • 提示注入攻擊防禦取得進展,這是代理式 AI 系統面臨的最嚴重安全風險之一
  • 首次在 System Card 中納入「可解釋性」技術評估,提升模型透明度和可信度 6

7

✳️ 知識圖譜

(更多關於知識圖譜…)

%%{init: {'theme':'default'}}%%
graph LR
    A[Foundation Model]:::concept --> B[Claude Sonnet 4.5]:::instance
    B --> C[Agentic AI System]:::concept
    B --> D[Code Generation]:::concept
    B --> E[Computer Use]:::concept
    
    C --> F[Long-Horizon Tasks]:::concept
    F --> G[Memory Management]:::concept
    F --> H[Context Window Management]:::concept
    
    B --> I[Claude Agent SDK]:::instance
    I --> J[Tool Use]:::concept
    I --> K[Orchestration]:::concept
    I --> L[Permission System]:::concept
    
    J --> E
    E --> L
    
    I --> M[Claude Code]:::instance
    M --> N[Checkpoints]:::concept
    M --> O[VS Code Extension]:::instance
    N --> P[Rollback]:::concept
    
    B --> Q[Extended Thinking]:::concept
    Q --> R[Reasoning]:::concept
    Q --> S[Test-Time Compute]:::concept
    
    B --> T[Model Alignment]:::concept
    T --> U[Reduces Sycophancy]:::concept
    T --> V[Reduces Deception]:::concept
    T --> W[Reduces Power-Seeking]:::concept
    
    B --> X[AI Safety Level 3]:::instance
    X --> Y[Safety Classifiers]:::concept
    Y --> Z[CBRN Detection]:::concept
    
    AA[Prompt Injection]:::concept --> |threatens| C
    B --> |defends against| AA
    
    AB[Mechanistic Interpretability]:::concept --> |evaluates| T
    
    B --> AC[Benchmark Evaluation]:::concept
    AC --> AD[SWE-bench Verified]:::instance
    AC --> AE[OSWorld]:::instance
    AD --> |measures| D
    AE --> |measures| E
    
    C --> AF[Multi-Agent System]:::concept
    AF --> K
    
    AG[Human-in-the-Loop]:::concept --> |controls| C
    
    I --> AH[API Integration]:::concept
    AH --> AI[Amazon Bedrock]:::instance
    AH --> AJ[Google Cloud Vertex AI]:::instance
    
    classDef concept fill:#FF8000,stroke:#333,stroke-width:2px,color:#000
    classDef instance fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff

✳️ 延伸閱讀