Meet Claude 3.7: The AI That Pauses Before Speaking (Finally)

Anthropic just launched Claude 3.7 Sonnet, their most intelligent model yet and the first hybrid reasoning AI on the market. Unlike its predecessors, this model can either respond instantly or take its sweet time thinking through complex problems.

Claude 3.7 Sonnet offers two distinct operating modes. Standard mode delivers quick responses like traditional AI models. Extended thinking mode allows Claude to work through problems step-by-step, showing its reasoning process directly to users.

Most competing reasoning models force you to choose between quick responses or deep thinking. Claude 3.7 unifies both capabilities into a single model. Users simply toggle between standard mode for quick answers and extended thinking mode when more cognitive horsepower is needed.

API users gain even more control. They can set a specific "thinking budget," telling Claude exactly how long to ponder a question. This creates a flexible tradeoff between speed, cost, and answer quality.

Benchmark Performance

Benchmark tests show Claude 3.7 Sonnet crushing the competition in real-world tasks. It scored state-of-the-art results on SWE-Bench Verified, which evaluates AI models' ability to solve real-world software issues.

For retail-related tasks with TAU-bench, Claude also achieved top performance. This benchmark framework tests AI agents on complex real-world tasks with user and tool interactions.

The improvement isn't just about raw intelligence. Claude 3.7 shows remarkable gains in instruction-following, general reasoning, multimodal capabilities, and agentic coding. Extended thinking mode provides a notable boost in math and science tasks.

Claude Code: An Agentic Coding Tool

Alongside the model, Anthropic introduced Claude Code, an agentic coding tool. This terminal-based assistant is available as a limited research preview and enables developers to delegate substantial engineering tasks to Claude directly from their terminal.

Claude Code is an active collaborator that can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command line tools—keeping developers in the loop at every step.

Early testing showed impressive results. Claude Code completed tasks in a single pass that would normally take 45+ minutes of manual work, reducing development time and overhead.

Improved GitHub Integration

Anthropic has also enhanced the coding experience on Claude.ai. Their GitHub integration is now available on all Claude plans—enabling developers to connect their code repositories directly to Claude.

With a deeper understanding of personal, work, and open source projects, Claude 3.7 Sonnet becomes a more powerful partner for fixing bugs, developing features, and building documentation across important GitHub projects.

Early Industry Validation

Several tech companies have already validated Claude's improved capabilities. Cursor noted Claude is once again best-in-class for real-world coding tasks, with significant improvements in areas ranging from handling complex codebases to advanced tool use.

Cognition found it far better than any other model at planning code changes and handling full-stack updates. Vercel highlighted Claude's exceptional precision for complex agent workflows, while Replit has successfully deployed Claude to build sophisticated web apps and dashboards from scratch, where other models stall.

In Canva's evaluations, Claude consistently produced production-ready code with superior design taste and drastically reduced errors.

Availability and Pricing

Claude 3.7 Sonnet is available across all Claude plans—including Free, Pro, Team, and Enterprise—plus the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. The extended thinking mode is available on all surfaces except the free Claude tier.

Pricing holds steady at $3 per million input tokens and $15 per million output tokens. This includes any tokens used during the thinking process.

Safety and Responsibility

Anthropic conducted extensive testing and evaluation of Claude 3.7 Sonnet, working with external experts to ensure it meets their standards for security, safety, and reliability.

The new model makes more nuanced distinctions between harmful and benign requests, reducing unnecessary refusals by 45% compared to its predecessor. A detailed system card covers new safety results in several categories, providing a breakdown of their Responsible Scaling Policy evaluations.

Future Development

Anthropic plans to continually improve Claude Code based on usage: enhancing tool call reliability, adding support for long-running commands, improved in-app rendering, and expanding Claude's own understanding of its capabilities.

Their goal with Claude Code is to better understand how developers use Claude for coding to inform future model improvements.

Why this matters:

Anthropic's unified approach treats reasoning as an integrated capability rather than requiring separate models for quick responses versus deep thinking.
By making AI reasoning visible and controllable, Claude 3.7 Sonnet gives users unprecedented transparency into how AI systems reach their conclusions.
The introduction of agentic coding tools signals a shift toward AI systems that can work autonomously on complex tasks while still collaborating effectively with humans.

Rise of 'Vibe Coding': How AI Is Reshaping Software Development

Palo Alto Networks Plans to Buy Security Startup Protect AI for $500M+

Secret AI Study Shows Bots Can Change Minds Better Than Humans