Startup Subquadratic says it has solved a decade-old maths problem limiting LLM speed and cost. Here is what the evidence shows so far and what to watch.

AI startup Subquadratic came out of stealth last month with a significant claim: it has solved a mathematical bottleneck that has constrained large language models for nearly a decade. The company says its approach cuts the number of computations transformers need to generate answers, producing a model that is faster, cheaper, and more energy-efficient than existing options. Experts were broadly skeptical at first, but Subquadratic has since started sharing supporting data, according to MIT Technology Review.
Subquadratic, an AI startup, stepped out of stealth mode last month and made a bold technical claim: it has cracked a maths problem that has held back large language models for roughly ten years.
The core of the argument is about computation scaling. Standard transformer architectures, which power virtually every major LLM today, require a number of calculations that grows sharply as inputs get longer. Subquadratic says its approach cuts those calculations significantly, which in theory means the resulting models run faster, cost less to operate, and draw less energy than competing models currently on the market.
When the announcement first landed, many researchers pushed back. Claims of fundamental architectural breakthroughs are common in AI, and most do not hold up at scale. However, Subquadratic has since begun releasing data to back the claim, and according to MIT Technology Review, that evidence is starting to attract genuine attention from experts who were initially unconvinced.
The transformer bottleneck is not an abstract research problem. It is the reason running LLMs at scale costs so much and consumes so much electricity. Any reduction in per-token computation translates directly into lower inference costs for every business using these models via API or running them in-house.
If the Subquadratic approach holds up under scrutiny, the implications are real for anyone budgeting for AI-powered features in their products:
It also matters for the broader AI market. If one startup can demonstrate a credible architectural improvement, it puts pressure on the larger labs to respond, and it raises questions about whether the current generation of transformer-based models is closer to its efficiency ceiling than previously assumed.
Be interested, but do not reorganise your AI budget around a startup’s stealth-exit press cycle. The pattern here is familiar: a bold claim, initial skepticism, then a selective data release timed to keep the story alive. That is not evidence of bad faith, it is just how early-stage AI companies build credibility. The receipts matter a lot here.
What would actually move the needle for us is independent replication. Has anyone outside Subquadratic run the same benchmarks? Have the results been published in a form that other researchers can stress-test? MIT Technology Review notes the evidence is worth attention, which is a meaningful signal from a publication that is usually careful with that kind of language. But “worth attention” is not the same as “verified at production scale.”
For businesses currently spending on LLM inference, the practical step is to watch which major API providers start referencing this architecture in their model release notes. That is when the claims become relevant to your invoices.
You do not need to act today, but you should set a reminder to check back in 60 to 90 days. If Subquadratic’s approach is real, other labs will either adopt it or publish rebuttals, and that secondary reaction will tell you more than the original announcement. If you are currently locked into long-term infrastructure contracts based on current LLM compute costs, flag this to your vendor as a conversation worth having at renewal time.