AI search tools differ significantly in hallucination rates and instruction-following accuracy¶
Insight: According to the author, Grok 4.20 achieved 22% hallucination rate and 83% instruction-following accuracy (IFEBench), with an ELO score of 1226 on LMArena Search Arena. The tool uses four simultaneous AI agents and has access to real-time X/Twitter data (~68M English tweets daily). Selection should be driven by personal workflow needs rather than rankings alone.
Detail: The author provides step-by-step customization instructions and prompt templates for use cases including competitive intelligence, sales prep, hiring, and financial research. The article emphasizes testing tools personally rather than trusting published rankings.
Sources
- Ruben Hassid (How to AI) — "Search" (2026-03-22)