The AI Infrastructure War Shifts From Training to Inference
Three converging data points this week crystallise a structural shift: xAI's 220,000-GPU Colossus 1 supercluster failed as a training system due to mixed-architecture inefficiencies and has been leased in its entirety to Anthropic for inference workloads; Cerebras — whose entire product thesis is inference-optimised wafer-scale silicon — debuted at $67 billion; and SambaNova's CEO used Cerebras' IPO moment to publicly frame inference cost-per-token as the real competitive frontier. The DRAM ETF reaching $10 billion in assets at record pace reflects institutional money repositioning toward memory bandwidth, the binding constraint in inference at scale, not training silicon.
The commercial logic extends beyond hardware. OpenAI's Plaid integration, Amazon's full deployment of Alexa as its core search interface, and OpenAI's organisational pivot to agents are all inference-layer events — they represent the deployment of already-trained models into real-world execution contexts at massive scale. Training compute determines what models can do; inference infrastructure determines who captures the economic value from doing it. Capital allocators tracking inference cost-per-token across major providers now have a more reliable leading indicator than training benchmark comparisons.