GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging. In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications. The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios. Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus
翻译:图形用户界面(GUI)智能体已成为自动化数字环境交互的强大范式,然而同时实现广泛的通用性和持续强大的任务性能仍然具有挑战性。在本报告中,我们提出了 UI-Venus-1.5,一个为鲁棒的实际应用设计的统一、端到端 GUI 智能体。所提出的模型系列包含两个稠密变体(2B 和 8B)以及一个专家混合变体(30B-A3B),以满足各种下游应用场景的需求。与我们之前的版本相比,UI-Venus-1.5 引入了三项关键的技术进展:(1)一个全面的中期训练阶段,利用超过 30 个数据集中的 100 亿 token 来建立基础的 GUI 语义;(2)采用全轨迹展开的在线强化学习,将训练目标与大规模环境中长视野、动态导航对齐;(3)通过模型合并构建的单一统一 GUI 智能体,将特定领域模型(基础模型、网页模型和移动模型)融合成一个连贯的检查点。广泛的评估表明,UI-Venus-1.5 在 ScreenSpot-Pro(69.6%)、VenusBench-GD(75.0%)和 AndroidWorld(77.6%)等基准测试中确立了新的最先进性能,显著超越了先前强大的基线模型。此外,UI-Venus-1.5 在各种中国移动应用中展现出鲁棒的导航能力,能够在实际场景中有效执行用户指令。代码:https://github.com/inclusionAI/UI-Venus;模型:https://huggingface.co/collections/inclusionAI/ui-venus