OpenAI’s o3 model, unveiled in early 2025, represents the clearest demonstration yet of how far artificial intelligence has come — and of the uncomfortable economic questions that come with that progress.
On ARC-AGI, a benchmark designed to test general reasoning and considered one of the hardest problems in AI evaluation, o3 scored 87.5 percent. The previous best from any AI system was 53 percent. On competition mathematics problems, it solved 96.7 percent of questions correctly.
“We did not expect this to work,” said one OpenAI researcher. “The scaling laws held in a direction we hadn’t fully anticipated.”
But the performance comes at a cost that is, for most use cases, prohibitive. In its most capable configuration, o3 reportedly uses more than a thousand times more compute per query than GPT-4o. Early estimates suggested that solving a single hard reasoning problem could cost anywhere from $20 to several hundred dollars in API fees.
This creates a bifurcated AI landscape that some researchers find troubling. The models capable of the most transformative work are accessible only to well-funded organisations. Meanwhile, the cheap, widely available models that most people use are significantly less capable.
Leave a Reply