Agent self-improvement is the hard problem after agent building¶
Insight: Most AI agents don't learn from experience despite being built on machine learning. The gap exists because fine-tuning LLMs requires conversational example pairs (not just knowledge), reinforcement learning needs realistic environments and compute, and user data risks privacy leaks. Meanwhile, in-context learning (prompt-based) makes feedback costs quadratic — every past interaction consumes tokens in future ones. Prompt caching helps but breaks with personalization. The next differentiator for AI products will be reliable, secure self-improvement.
Detail: The toolbox for agent self-improvement includes: (1) Knowledge improvement — persistent user context via memory systems; (2) Behavior improvement — learning to solve problems more effectively. These are independent but interrelated. The core tension: ML was supposed to "learn from experience with respect to some class of tasks" but GPT wrappers built on ML are fundamentally static because training next-token-prediction models is non-trivial compared to task-specific models. Solutions like Anthropic's memory system, Cursor's personalization, and various "memory bank" approaches are early attempts at bridging this gap.
Sources
- Shrivu Shankar — "How to Train Your GPT Wrapper" (2025-06-28)