One Year of AI Agents: What They Got Right, What They Got Wrong, and What’s Coming Next

A year ago, the conventional wisdom about AI agents was that they were impressive in demos and unreliable in practice — systems that could complete simple, well-defined tasks but fell apart the moment anything unexpected happened. That conventional wisdom has not been entirely wrong, but it has been substantially revised by twelve months of real-world deployment.

The tasks where agents have become genuinely useful are specific and share common characteristics: they involve sequences of well-defined steps, the output of each step is verifiable, and errors are recoverable without catastrophic consequences. Software testing, data extraction, document processing, and basic research tasks all fit this profile. Agents deployed in these domains are now handling workloads that previously required significant human time.

The tasks where agents have consistently underperformed expectations are equally instructive: anything requiring sustained judgment about ambiguous situations, tasks where the cost of errors is high, and work that depends on understanding unspoken social or organisational context. Several high-profile agent failures in 2025 — including a financial services company that lost a significant sum when an agent misinterpreted an ambiguous instruction and a healthcare provider that faced regulatory scrutiny after an agent sent communications it was not supposed to — have made enterprises cautious about autonomous deployment in sensitive domains.

“The companies that have gotten the most value from agents are the ones that have been thoughtful about human-in-the-loop design,” said Yann LeCun in a widely read interview earlier this year. “The companies that have had problems are the ones that trusted the systems more than the systems deserved.”

What nobody predicted is how quickly the developer ecosystem has grown. A year ago, building an agent required significant custom engineering. Today, there are dozens of frameworks, hundreds of pre-built tool integrations, and a growing body of best practices for agent design. The infrastructure is maturing rapidly. Whether the underlying models are improving fast enough to justify the expanding ambitions of the applications built on top of them is the central question of 2026.

Featured

The 6G Standard Has Been Finalised. Here’s What a 100x Faster Wireless World Looks Like

Zoom vs Google Meet vs Microsoft Teams: The Remote Meeting Tool Comparison