Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these capabilities. To address these challenges, we propose Channel-of-Mobile-Experts (CoME), a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, CoME activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation. To empower CoME with hybrid-capabilities reasoning, we introduce a progressive training strategy: Expert-FT enables decoupling and enhancement of different experts' capability; Router-FT aligns expert activation with the different reasoning stage; CoT-FT facilitates seamless collaboration and balanced optimization across multiple capabilities. To mitigate error propagation in hybrid-capabilities reasoning, we propose InfoGain-Driven DPO (Info-DPO), which uses information gain to evaluate the contribution of each intermediate step, thereby guiding CoME toward more informative reasoning. Comprehensive experiments show that CoME outperforms dense mobile agents and MoE methods on both AITZ and AMEX datasets.
翻译:移动代理能够自主执行用户指令,这需要混合能力推理,包括屏幕摘要、子任务规划、行动决策和行动执行。然而,现有代理难以同时实现这些能力的解耦增强与均衡整合。为解决这些挑战,我们提出了移动专家通道(CoME),这是一种新颖的代理架构,由四位不同的专家组成,每位专家对应一个特定的推理阶段。CoME通过面向输出的激活机制,在每一推理阶段激活相应的专家以生成输出标记。为赋予CoME混合能力推理能力,我们引入了一种渐进式训练策略:专家微调(Expert-FT)实现不同专家能力的解耦与增强;路由微调(Router-FT)使专家激活与不同推理阶段对齐;思维链微调(CoT-FT)促进多能力间的无缝协作与均衡优化。为减轻混合能力推理中的错误传播,我们提出了信息增益驱动的直接偏好优化(Info-DPO),该方法利用信息增益评估每个中间步骤的贡献,从而引导CoME进行信息更丰富的推理。综合实验表明,CoME在AITZ和AMEX数据集上均优于密集移动代理与混合专家方法。