Skip to content

AI-resistant evaluations emphasize novel problem-solving over pattern recognition

Insight: AI-resistant evaluations move away from well-documented optimization challenges toward out-of-distribution problems inspired by constrained instruction sets (e.g., Zachtronics puzzles). Removing debugging tools and visualization forces candidates to develop novel approaches. Extended time horizons allow human expertise to outperform models since humans maintain advantage when given unlimited time. Human evaluators win through novel reasoning rather than pattern-matching, and evaluations simulating real novel work create harder challenges.

Detail: Rather than banning AI, the approach iteratively raises problem difficulty. Using Claude Opus 4 itself to identify struggle points creates the new baseline. Building debugging tools becomes part of what's evaluated, not scaffolding.

Sources