Skip to content

AI coding quality is a skill issue — bad prompts produce bad code, not bad models

Insight: When AI-generated code is bad or useless, it's often a prompting skill issue rather than a model limitation. LLMs can produce code at all levels of quality — it's up to the user's instructions to set the right defaults. The framework of "person did role X, X gets automated so now they do role Y" doesn't apply to AI because AI might be better at both X and Y. The current chat-on-IDE interface (Copilot, Cursor, v0) is a clunky transitional UX that will evolve as models and codebases adapt.

Detail: Shrivu (who consumes billions of tokens/month for codegen at his company) makes several specific observations: (1) Sonnet 3.x remains "in a league of its own" for code generation as of early 2025, (2) Many engineers suffer from the IKEA effect — overvaluing hand-written code, (3) Insecure code from AI is a real problem but will be addressed through security benchmarks that model providers compete on, model-as-reviewer patterns, and automated theorem proving, (4) The current IDE interface gives a false impression that "AI will write your code for you" while dumping large amounts of code that requires expertise to review — as models evolve, AI IDEs will no longer look like IDEs, (5) LLM intelligence doesn't fully align with human intelligence — they may still make token counting mistakes while becoming "superintelligent" in other ways.

Sources