Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks
0
Business

Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks

April 27, 2026
Scroll

Posted 4 hours ago by

Xiaomi, the Chinese firm best known for its smartphones and electric vehicles, has lately been shipping some incredibly affordable and high-powered open source AI large language models.The trend continued today with the release of Xiaomi MiMo-V2.5 and Xiaomi MiMo-V2.5-Pro, both available under the permissive, enterprise-friendly MIT License, making them suitable for use in production in commercial applications.

Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks

Enterprises and individual/independent developers can now download either of the models (and more Xiaomi open source options) directly from Hugging Face, modify them as needed, and run them locally or on virtual private clouds as they see fit. The most notable attribute of these models besides the open source licensing is that, according to Xiaomi's published benchmarks, they are among the most efficient available for agentic claw tasks, that is, powering systems such as OpenClaw, NanoClaw and Hermes Agent, in which users can communicate with them directly over third-party messaging apps and have the agents go off and complete tasks on the human user's behalf, such as making and publishing marketing content, running accounts, organizing email and scheduling, etc.As Xiaomi's ClawEval benchmark chart shows, both MiMo-V2.5 and the Pro version in particular appear near the top left of the chart, indicating high performance in completing the benchmarked claw tasks while using the fewest amount of tokens — saving the human user money, especially in a world where more and more services such as Microsoft's GitHub Copilot are moving to usage-based billing (charging the human behind the agents for each token used rather than imposing rate limits like Anthropic or providing an all-you-can-eat buffet-style subscription like OpenAI). In fact, the Pro model leads the open-source field with a 63.8 success rate, consuming only ~70K tokens per trajectory. This is roughly 40–60 fewer tokens than those required by Anthropic Claude Opus 4.6, Google Gemini 3.1 Pro, and OpenAI GPT-5.4 to achieve comparable results.By combining a massive 310B-parameter architecture with a highly efficient active footprint and a native 1-million-token context window, Xiaomi MiMo is challenging the dominance of closed-source frontier models from Google and OpenAI, especially when it comes to the latest and greatest craze in enterprise AI deployments — agentic tasks and claws similar to OpenClaw.A two-pronged pincerXiaomi has released two distinct versions of the model to serve different ends of the development spectrum: MiMo-V2.5 (the Omni multimodal specialist) and MiMo-V2.5-Pro (the Agent specialist).While the base model provides native multimodality, the MiMo-V2.5-Pro is specifically engineered for long-horizon coherence and complex software engineering. On the GDPVal-AA (Elo) benchmark, the Pro model achieved a score of 1581, surpassing competitors like Kimi K2.6 and GLM 5.1. Xiaomi researchers further released data on several high-complexity tasks performed autonomously by V2.5-Pro:SysY Compiler in Rust: The model implemented a complete compiler from scratch—including lexer, parser, and RISC-V assembly backend—in 4.3 hours. Spanning 672 tool calls, the model achieved a perfect 233/233 score on hidden test suites, a task that typically takes a computer science major several weeks.Full-Featured Video Editor: Over 11.5 hours and 1,868 tool calls, the model produced an 8,192-line desktop application featuring multi-track timelines and an export pipeline.Analog EDA Optimization: In a graduate-level engineering task, the model optimized a Flipped-Voltage-Follower (FVF-LDO) regulator in the TSMC 180nm process. By iterating through an ngspice simulation loop, the model improved metrics like line regulation by 22x over its initial attempt.These experiments highlight a harness awareness in V2.5-Pro, where the model actively manages its own memory and shapes its context to sustain coherence over thousands of sequential tool calls.Over the API, Xiaomi is pricing the models at competitive rates for both domestic (Chinese) and international markets (like the U.S.). For overseas developers, the high-performance MiMo-V2.5-Pro is priced at 1.00 per million input tokens (for a cache miss) and 3.00 for output within context windows up to 256K. For ultra-long context tasks between 256K and 1M tokens, the cost doubles to 2.00 for input and 6.00 for output, though the architecture’s caching capabilities offer significant relief, reducing input costs to as little as 0.20 to 0.40 per million tokens upon a cache hit. Domestically, these rates are mirrored in yuan, with the Pro model starting at ¥7.00 per million input tokens for standard context and reaching ¥14.00 for the extended 1M range. Meanwhile, the base model starts at just 0.40 USD for overseas input per million tokens and 2.00 per million output, putting it among the more affordable third of leading LLMs globally (see our chart below):ModelInputOutputTotal CostSourceGrok 4.1 Fast0.200.500.70xAIMiniMax M2.70.301.201.50MiniMaxMiMo-V2.5 Flash0.100.300.40Xiaomi MiMoGemini 3 Flash0.503.003.50GoogleKimi-K2.50.603.003.60MoonshotMiMo-V2.50.402.002.40Xiaomi MiMoMiMo-V2-Pro (256K)1.003.004.00Xiaomi MiMoGLM-51.003.204.20Z.aiGLM-5-Turbo1.204.005.20Z.aiDeepSeek V4 Pro1.743.485.22DeepSeekGLM-5.11.404.405.80Z.aiClaude Haiku 4.51.005.006.00AnthropicQwen3-Max1.206.007.20Alibaba CloudGemini 3 Pro2.0012.0014.00GoogleGPT-5.21.7514.0015.75OpenAIGPT-5.42.5015.0017.50OpenAIClaude Sonnet 4.53.0015.0018.00AnthropicClaude Opus 4.75.0025.0030.00AnthropicGPT-5.55.0030.0035.00OpenAIGPT-5.4 Pro30.00180.00210.00OpenAITo lower the barrier for agentic development further, Xiaomi has made cache writing free of charge for a limited time across all models, alongside a total fee waiver for the entire MiMo-V2.5-TTS suite, which includes its specialized voice cloning and design features. This pricing logic is clearly designed to accelerate the transition from simple chat applications to persistent, long-horizon agents that can operate at a fraction of the cost of legacy frontier models.Xiaomi has also introduced an overhauled version of its subscription offerings, called the Token Plan, now available in four levels: The Lite Starter Pack provides 720 million credits for 63.36 USD per yearStandard tier offers 2.4 billion credits for 168.96 per yearA Pro tier provides 8.4 billion credits for 528.00 per year (designed for enterprise use cases)Max —aimed at high-intensity coding enthusiasts—delivers 19.2 billion credits for 1,056.00 per yearBeyond credit allotments, all plans include preferential API rates, a 20 discount for off-peak calls, and Day-0 support for popular coding scaffolds like Cursor, Zed, and Claude Code.However, both through the API and via the Token Plan, accessing the Xiaomi models from China may present barriers or additional compliance and regulatory risks to U.S.-based enterprise customers. As such, the best bet for U.S. enterprises concerned about relying on Chinese tech but wanting to take advantage of the low cost and open source models is likely setting up their own virtual private clouds or local servers, downloading the model weights, and running the models domestically. MoE architecture but divergent training regimens for V2.5 and V2.5-ProAt the heart of MiMo-V2.5 is a Sparse Mixture-of-Experts (MoE) architecture. While the model boasts a total of 310 billion parameters, only 15 billion are active during any given inference cycle. Meanwhile, V2.5-Pro is 1.02 trilion-parameter Mixture-of-Experts model with 42 billion active parameters. In either case, the design functions much like a specialized research hospital: while the facility has hundreds of doctors (parameters), only the specific specialists required for a particular case (query) are called into the room. This massive increase in parameter volume for the Pro version provides the neural capacity required for the deep, multi-step reasoning found in complex software engineering and long-horizon tasks, as though even more specialists are available in an even larger hospital. According to Xiaomi's blog post, the regular V2.5 follows a rigorous five-stage evolution:Text Pre-training: Building a massive language backbone on 48 trillion tokens.Projector Warmup: Aligning in-house audio and visual encoders with the language core.Multimodal Pre-training: Scaling across high-quality cross-modal data.Agentic Post-training: Progressively extending the context window from 32K to 1M tokens.RL and MOPD: Utilizing Reinforcement Learning and Multimodal Preference Optimization (MOPD) to sharpen real-world reasoning and perception.The backbone utilizes a hybrid sliding-window attention architecture, inherited from MiMo-V2-Flash, which optimizes how the model remembers long-range information. This technical foundation enables MiMo-V2.5 to see, hear, and reason natively, rather than relying on external plug-in tools for visual or auditory processing.Conversely, the training of MiMo-V2.5-Pro prioritizes action space over sensory perception. Instead of sensory alignment, the Pro model’s training focus shifts toward scaling post-training compute. This process is designed to instill harness awareness, where the model is specifically trained to manage its own memory and context within autonomous agent scaffolds like Claude Code or OpenCode. While the base V2.5 model is trained to reason across modalities, the Pro version is trained to sustain coherence across more than a thousand sequential tool calls.The standard V2.5 model balances local and global attention to maintain multimodal perception. The Pro model, however, utilizes an increased hybrid attention ratio—evolving from the 5:1 ratio of previous generations to a more aggressive 7:1 ratio. This allows the Pro model to skim the vast majority of its context while applying high-density attention to the specific 15 of data most relevant to its current objective, a critical feature for debugging large repositories or optimizing graduate-level circuits.Finally, while both models undergo Reinforcement Learning (RL) and Multimodal Preference Optimization (MOPD), the objectives of these stages differ. For MiMo-V2.5, the RL stage is used to sharpen perception and multimodal reasoning. For MiMo-V2.5-Pro, RL is focused on instruction following within agentic scenarios, ensuring the model adheres to subtle requirements embedded deep within ultra-long contexts and recovers gracefully from errors during autonomous execution. This results in the Pro model's self-correcting discipline, as seen in its ability to diagnose and fix regressions during the 4.3-hour SysY compiler build.Full MIT License is perfect for enterprise use casesIn a move that distinguishes it from many open models that include restrictive Acceptable Use policies, Xiaomi has released MiMo-V2.5 under the MIT License.The MIT License is the gold standard of permissive software licensing. For developers and enterprises, this means:No Authorization Required: Companies can deploy the model commercially without seeking explicit permission from Xiaomi.Continued Training: Developers are free to fine-tune the model on proprietary data and even release those derivative weights.Unrestricted Commercial Use: There are no revenue caps or user-base limits that often plague community licenses.By choosing MIT over a custom open weights license, Xiaomi is positioning MiMo as the foundational infrastructure for the next generation of AI agents, effectively inviting the global developer community to treat the model as a public utility.Xiaomi's background: from smartphones and EVs to Chinese open source AI darlingXiaomi’s pivot toward frontier AI agents is the logical culmination of a decade spent building one of the world's most dense hardware-software flywheels. Founded in 2010 as a smartphone disruptor, the Beijing-based company has executed a high-stakes transition into a vertically integrated powerhouse defined by its Human x Car x Home strategy. This ecosystem now encompasses over 823 million connectable smart devices unified under the HyperOS architecture.The company’s 2024 entry into the automotive sector with the SU7 and the subsequent high-performance YU7 SUV served as a proof of concept for this integration, positioning Xiaomi as a direct competitor to global luxury marques. By investing 200 billion yuan (29B USD) into foundational RD for chips and operating systems, Xiaomi has moved beyond consumer electronics assembly; it has become an architect of the action space, using its massive hardware footprint as the primary testing ground for the agentic intelligence found in the MiMo-V2.5 series.Ecosystem supportThe release has been met with immediate Day-0 support from the broader AI ecosystem. The MiMo team announced that SGLang and vLLM—two of the most popular high-throughput inference engines—supported the V2.5 series at launch. This was made possible through hardware partnerships with AWS, AMD, T-HEAD, and Enflame, ensuring the model can run efficiently on everything from cloud-based H100s to domestic Chinese accelerators.Fuli Luo, the project lead at Xiaomi MiMo and a former key member of the DeepSeek team, underscored the philosophy behind the release on X (formerly Twitter):A model's value isn't measured by rankings alone — it's measured by the problems it solves. Let's build with MiMo now!To kickstart this building phase, Luo announced a 100-trillion free token grant for builders and creators. This massive incentive is designed to lower the barrier to entry for developers who want to experiment with the 1M context window without immediate financial risk.The economic realignment: open source vs. metered proprietaryThe launch arrives at a critical juncture for AI economics. The shift toward usage-based billing marks the definitive end of the all-you-can-eat buffet era for AI services, a trend underscored by GitHub’s announcement today that its AI coding assistant Github Copilot will transition all plans to metered, token-based credits. As seat-based predictability gives way to consumption-driven costs, premium agentic workflows—which can consume millions of tokens in a single reasoning session—are becoming increasingly difficult for enterprises to budget.User sentiment has turned predictably cynical, with developers lamenting that they will get less, but pay the same price as subscriptions convert into finite allotments. This pricing evolution significantly enhances the strategic appeal of the MiMo series. By releasing under a permissive MIT License, Xiaomi allows organizations to bypass the escalating SaaS tax and reclaim financial predictability through private deployment.Crucially, Xiaomi has eliminated the context tax for its API. The 1-million-token context window is now billed at the standard rate—1 token = 1 credit for V2.5 and 2 credits for the Pro version—with no additional multiplier. This stands in stark contrast to the industry-wide move toward session-based caps, positioning MiMo as a refuge for cost-sensitive, high-volume development.Analysis for enterprisesThe launch of MiMo-V2.5 is more than just a weight drop; it is a declaration of independence for the open-source community. By matching Claude Sonnet 4.6 in multimodal agentic work and Gemini 3 Pro in video understanding, Xiaomi has proven that the gap between closed-door labs and open research is effectively closed. With the MIT license as a catalyst and a 100T token grant as fuel, the coming months will likely see a surge in specialized, agentic applications built on the MiMo backbone.Confirming the project's ambitious trajectory, the team noted they are already training the next generation, focusing on deeper reasoning and richer real-world grounding. For now, MiMo-V2.5 stands as a testament to the power of sparse architectures and permissive licensing in the race toward functional AGI.

VentureBeat
VentureBeat

Coverage and analysis from United States of America. All insights are generated by our AI narrative analysis engine.

United States of America
Bias: Unknown

People's Voices (0)

Leave a comment
0/500
Note: Comments are moderated. Please keep it civil. Max 3 comments per day.
You might also like

Explore More