Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof.

May 5, 2026

Scroll

Posted 3 hours ago by
VentureBeat

A little-known Miami-based startup called Subquadratic emerged from stealth on Tuesday with a sweeping claim: that it has built the first large language model to fully escape the mathematical constraint that has defined — and limited — every major AI system since 2017.The company claims its first model, SubQ 1M-Preview, is the first LLM built on a fully subquadratic architecture — one where compute grows linearly with context length.

If that claim holds, it would be a genuine inflection point in how AI systems scale. At 12 million tokens, the company says, its architecture reduces attention compute by almost 1,000 times compared to other frontier models — a figure that, if validated independently, would dwarf the efficiency gains of any existing approach.The company is also launching three products into private beta: an API exposing the full context window, a command-line coding agent called SubQ Code, and a search tool called SubQ Search. It has raised 29 million in seed funding from investors including Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex. The New Stack reported that the raise values the company at 500 million.The numbers Subquadratic is publishing are extraordinary. The reaction from the AI research community has been, to put it mildly, mixed — ranging from genuine curiosity to open accusations of vaporware. Understanding why requires understanding what the company claims to have solved, and why so many prior attempts to solve the same problem have fallen short.The quadratic scaling problem has shaped the economics of the entire AI industryEvery transformer-based AI model — which includes virtually every frontier system from OpenAI, Anthropic, Google, and others — relies on an operation called attention. Every token is compared against every other token, so as inputs grow, the number of interactions — and the compute required to process them — scales quadratically. In plain terms: double the input size, and the cost doesn't double. It quadruples.This relationship has shaped what gets built and what doesn't. The industry standard is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro. Even at those sizes, the cost of processing long inputs becomes punishing. The industry built an elaborate stack of workarounds to cope. RAG systems use a search engine to pull a small number of relevant results before sending them to the model, because sending the full corpus isn't feasible. Developers layer retrieval pipelines, chunking strategies, prompt engineering techniques, and multi-agent orchestration systems on top of models — all to route around the fundamental constraint that the model itself can't efficiently process everything at once.Subquadratic's argument is that these workarounds are expensive, brittle, and ultimately limiting. As CTO Alexander Whedon told SiliconANGLE in an interview, I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows. And I think that that is kind of a waste of human intelligence and also limiting to the product quality.Subquadratic's fix is deceptively simple: stop doing the math that doesn't matterThe company's approach, called Subquadratic Sparse Attention or SSA, is built on a straightforward premise: most of the token-to-token comparisons in standard attention are wasted compute. Instead of comparing every token to every other token, SSA learns to identify which comparisons actually matter and computes attention only over those positions. Crucially, the selection is content-dependent — the model decides where to look based on meaning, not on fixed positional patterns. This allows it to retrieve specific information from arbitrary positions across a very long context without paying the quadratic tax.The practical payoff scales with context length — exactly the inverse of the problem it's trying to solve. According to the company's technical blog, SSA achieves a 7.2x prefill speedup over dense attention at 128,000 tokens, rising to 52.2x at 1 million tokens. As Whedon put it: If you double the input size with quadratic scaling laws, you need four times the compute; with linear scaling laws, you need just twice. The company says it trained the model in three stages — pretraining, supervised fine-tuning, and a reinforcement learning stage specifically targeting long-context retrieval failures — teaching the model to aggressively use distant context rather than defaulting to nearby information, a subtle failure mode that quietly degrades performance in existing systems.Three benchmarks paint a strong picture, but what they leave out may matter moreOn the surface, SubQ's benchmark numbers are competitive with or superior to models built by organizations spending billions of dollars. On SWE-Bench Verified, it scored 81.8 compared to Opus 4.6's 80.8 and DeepSeek 4.0 Pro's 80.0. On RULER at 128,000 tokens, a standard benchmark for reasoning over extended inputs, SubQ scored 95 — edging out Claude Opus 4.6 at 94.8. On MRCR v2, a demanding test of multi-hop retrieval across long contexts, SubQ posted a third-party verified score of 65.9, compared with Claude Opus 4.7 at 32.2, GPT-5.5 at 74, and Gemini 3.1 Pro at 26.3.But several details warrant scrutiny. The benchmark selection is narrow — exactly three tests, all emphasizing long-context retrieval and coding, the precise tasks SubQ is designed for. Broader evaluations across general reasoning, math, multilingual performance, and safety have not been published. The company says a comprehensive model card is coming soon.According to The New Stack, each benchmark model was run only once due to high inference cost, and the SWE-Bench margin is, as the company's own paper acknowledges, harness as much as model. In benchmark methodology, single runs without confidence intervals leave room for variance. There is also a significant gap between SubQ's research results and its production model. On MRCR v2, the company reported a research score of 83 — but the third-party verified production model scored 65.9. That 17-point gap between the lab result and the shipping product is notable and largely unexplained.Subquadratic also told SiliconANGLE that on the RULER 128K benchmark, SubQ scored 95 accuracy at a cost of 8, compared with 94 accuracy and about 2,600 for Claude Opus — a remarkable cost claim. But the company has not publicly disclosed specific API pricing, making it impossible to independently verify the cost-per-task comparisons.The AI research community's verdict ranges from 'genuine breakthrough' to 'AI Theranos'Within hours of the announcement, the AI research community erupted into a debate that crystallized around a single question: Is this real?AI commentator Dan McAteer captured the binary mood in a widely shared post: SubQ is either the biggest breakthrough since the Transformer... or it's AI Theranos. The comparison to the infamous blood-testing fraud company may be unfair, but it reflects the scale of the claims being made. Skeptics zeroed in on several pressure points. Prominent AI engineer Will Depue initially noted that SubQ is almost surely a sparse attention finetune of Kimi or DeepSeek, referring to existing open-source models.Whedon confirmed this on X, writing that the company is using weights from open-source models as a starting point, as a function of our funding and maturity as a company. Depue later escalated his criticism, writing that the company's O(n) scaling claims and the speedup numbers don't seem to line up and called the communication either incredibly poorly communicated or just not real.Others raised structural questions. One developer noted that if SubQ truly reduces compute by 1,000x and costs less than 5 of Opus, the company should have no trouble serving it at scale — so why gate access through an early-access program? Developer Stepan Goncharov called the benchmarks very interesting cherry-picked benchmarks, while another commenter described them as suspiciously perfect.But not everyone was dismissive. AI researcher John Rysana pushed back on the Theranos framing, writing that the work is just subquadratic attention done well which is very meaningful for long context workloads, and that odds of it being BS are extremely low. Linus Ekenstam, a tech commentator, said he was extremely intrigued to see the real-world implications particularly for complex AI-powered software.Magic.dev made strikingly similar claims two years ago — and then went quietPerhaps the most pointed critique of SubQ's launch comes not from its specific claims but from recent history. Magic.dev announced a 100-million-token context-window model in August 2024, with a claimed 1,000x efficiency advantage, and raised roughly 500 million on the strength of those claims. As of early 2026, there is no public evidence of LTM-2-mini being used outside Magic.The parallels are uncomfortable. Both companies claimed massive context windows. Both touted roughly 1,000x efficiency gains. Both targeted software engineering as their primary use case. And both launched with limited external access.The broader research landscape reinforces the caution. Kimi Linear, DeepSeek Sparse Attention, Mamba, and RWKV all promised subquadratic scaling, and all faced the same problem: architectures that achieve linear complexity in theory often underperform quadratic attention on downstream benchmarks at frontier scale, or they end up hybrid — mixing subquadratic layers with standard attention and losing the pure scaling benefits.A widely cited LessWrong analysis argued that these approaches are all better thought of as 'incremental improvement number 93595 to the transformer architecture' because practical implementations remain quadratic and only improve attention by a constant factor.Subquadratic is directly aware of this history. Its own technical blog specifically addresses each prior approach — fixed-pattern sparse attention, state space models, hybrid architectures, and DeepSeek Sparse Attention — and argues that SSA avoids their tradeoffs. Whether it actually does remains an empirical question that only independent evaluation can settle.A five-time founder, a former Meta engineer, and 29 million to prove the doubters wrongThe team behind the claims matters in evaluating them. CEO Justin Dangel is a five-time founder and CEO with a track record across health tech, insurancetech, and consumer goods, and his companies have scaled to hundreds of employees, attracted institutional backing, and reached liquidity. CTO Alexander Whedon previously worked as a software engineer at Meta and served as Head of Generative AI at TribeAI, where he led over 40 enterprise AI implementations.The team includes 11 PhD researchers with backgrounds from Meta, Google, Oxford, Cambridge, ByteDance, and Adobe. That is a credible collection of talent for an architecture-level research effort. But neither co-founder has published foundational AI research, and the company has not yet released a peer-reviewed paper. The technical report is listed as coming soon.The funding profile is unusual for a company making frontier AI claims. Subquadratic raised 29 million at a reported 500 million valuation — a steep price for a seed-stage company with no publicly available model, no peer-reviewed research, and no disclosed revenue. The investor base, led by Tinder co-founder Mateen and former SoftBank partner Villamizar, skews toward consumer tech and growth investing rather than deep technical AI research. The company is not open-sourcing its weights but plans to offer training tools for enterprises to do their own post-training, and has set a 50-million-token context window target for Q4.The real test for SubQ isn't benchmarks — it's whether the math survives independent scrutinyStrip away the marketing language and the social media drama, and the underlying question Subquadratic is asking is genuinely important: Can AI systems break free of quadratic scaling without sacrificing the quality that makes them useful?The stakes are enormous. If attention can be made truly linear without degrading retrieval and reasoning, the economics of AI shift fundamentally. Enterprise applications that today require elaborate retrieval pipelines — processing entire codebases, contracts, regulatory filings, medical records — become single-pass operations. The billions of dollars currently spent on RAG infrastructure, context management, and agentic orchestration become partially redundant. Whedon's willingness to engage publicly with technical criticism — posting a technical blog within hours of pushback — suggests a team that understands it needs to show its work, not just describe it. And to its credit, the company acknowledged openly that it builds on open-source foundations and that its model is smaller than those at the major labs.Every frontier model in 2026 advertises a context window of at least a million tokens, but almost none of them are actually great at making use of all that information. The gap between a nominal context window and a functional one — between what a model accepts and what it reliably reasons over — remains one of the most important unsolved problems in AI. Subquadratic says it has closed that gap. If independent evaluation confirms that claim, the implications would ripple far beyond a single startup's valuation. If it doesn't, the company joins a growing list of long-context promises that sounded revolutionary on launch day and unremarkable six months later.In computing, every fundamental constraint eventually falls. When it does, the breakthrough never comes from the direction the industry expected. The question hanging over Subquadratic is whether a team of 11 PhDs and a 29 million seed round actually found the answer that has eluded organizations spending thousands of times more — or whether they just found a better way to describe the problem.

Read Full Article

VentureBeat

Coverage and analysis from United States of America. All insights are generated by our AI narrative analysis engine.

United States of America

Bias: Unknown

People's Voices (0)

0/500

Note: Comments are moderated. Please keep it civil. Max 3 comments per day.

No recent articles found in this language.

0

Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof.

May 5, 2026

Posted 3 hours ago by
VentureBeat

VentureBeat

People's Voices (0)

Leave a comment

You might also like

Explore More

Explore

Categories

News From

0

Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof.

May 5, 2026

Posted 3 hours ago by VentureBeat

VentureBeat

People's Voices (0)

Leave a comment

You might also like

Explore More

Posted 3 hours ago by
VentureBeat