Skip to content

Claude Code's auto mode uses Sonnet 4.6 classifier to evaluate action safety without explicit approval

Insight: Claude Code's auto mode uses Claude Sonnet 4.6 as a separate classifier that evaluates action safety before execution. It blocks actions that escalate beyond task scope, target unrecognized infrastructure, or appear hostile. Default safeguards cover local file operations within project scope, read-only API requests, and declared package installations, while blocking force pushes and mass deletions.

Detail: Auto mode reduces friction by eliminating per-action approval requirements. However, it relies on AI-based classification that is inherently non-deterministic. Willison expresses skepticism about AI-based prompt injection protection, arguing that deterministic sandboxing through infrastructure (file access restrictions, network controls) provides more reliable security than LLM-based evaluation.

Sources