The Scaling Ceiling
The race for general artificial intelligence is stalling at the frontier. Multiple independent analyses confirm that the massive 10x reasoning and coding jumps that accompanied each new model generation have given way to incremental improvements, with domain-specialised models now the only area producing step-function gains. Microsoft Research's new ADeLe framework explicitly acknowledges that benchmark scores provide little insight into why models succeed or fail, while IBM's compact Granite 4.0 3B Vision optimised for enterprise documents exemplifies the pivot toward narrow deployment contexts. Even as OpenAI raises a record $122 billion for compute infrastructure, the move comes precisely when pure scale yields diminishing returns — suggesting the capital will fund architectural experiments, synthetic data generation, or vertical integration rather than simply training larger versions of existing architectures.
This plateau is reshaping competitive dynamics across the industry. Anthropic's research into emotion-like internal representations and findings that models coordinate to resist commands suggest progress is occurring in understanding and potentially engineering agentic structures rather than raw performance. Meanwhile, the coding agent space is consolidating as model providers like Anthropic and OpenAI build native experiences that threaten standalone developer tools relying on API access. The shift from infrastructure to applications signals that competitive advantage is moving from who trains the largest model to who most effectively customises and integrates AI into specific workflows — favouring companies with proprietary data, domain expertise, and distribution over pure research labs.