UI-Venus-1.5 Technical Report

Venus Team,Changlong Gao,Zhangxuan Gu,Yulin Liu,Xinyu Qiu,Shuheng Shen,Yue Wen,Tianyu Xia,Zhenyu Xu,Zhengwen Zeng,Beitong Zhou,Xingran Zhou,Weizhi Chen,Sunhao Dai,Jingya Dou,Yichen Gong,Yuan Guo,Zhenlin Guo,Feng Li,Qian Li,Jinzhen Lin,Yuqi Zhou,Linchao Zhu,Liang Chen,Zhenyu Guo,Changhua Meng,Weiqiang Wang

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging. In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications. The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios. Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus

翻译：图形用户界面（GUI）智能体已成为自动化数字环境交互的强大范式，然而同时实现广泛的通用性和持续强大的任务性能仍然具有挑战性。在本报告中，我们提出了 UI-Venus-1.5，一个为鲁棒的实际应用设计的统一、端到端 GUI 智能体。所提出的模型系列包含两个稠密变体（2B 和 8B）以及一个专家混合变体（30B-A3B），以满足各种下游应用场景的需求。与我们之前的版本相比，UI-Venus-1.5 引入了三项关键的技术进展：（1）一个全面的中期训练阶段，利用超过 30 个数据集中的 100 亿 token 来建立基础的 GUI 语义；（2）采用全轨迹展开的在线强化学习，将训练目标与大规模环境中长视野、动态导航对齐；（3）通过模型合并构建的单一统一 GUI 智能体，将特定领域模型（基础模型、网页模型和移动模型）融合成一个连贯的检查点。广泛的评估表明，UI-Venus-1.5 在 ScreenSpot-Pro（69.6%）、VenusBench-GD（75.0%）和 AndroidWorld（77.6%）等基准测试中确立了新的最先进性能，显著超越了先前强大的基线模型。此外，UI-Venus-1.5 在各种中国移动应用中展现出鲁棒的导航能力，能够在实际场景中有效执行用户指令。代码：https://github.com/inclusionAI/UI-Venus；模型：https://huggingface.co/collections/inclusionAI/ui-venus