Learning adaptive visuomotor policies for embodied agents remains a formidable challenge, particularly when facing cross-embodiment variations such as diverse sensor configurations and dynamic properties. Conventional learning approaches often struggle to separate task-relevant features from domain-specific variations (e.g., lighting, field-of-view, and rotation), leading to poor sample efficiency and catastrophic failure in unseen environments. To bridge this gap, we propose ContrAstive Prompt Orchestration (CAPO), a novel approach for learning visuomotor policies that integrates contrastive prompt learning and adaptive prompt orchestration. For prompt learning, we devise a hybrid contrastive learning strategy that integrates visual, temporal action, and text objectives to establish a pool of learnable prompts, where each prompt induces a visual representation encapsulating fine-grained domain factors. Based on these learned prompts, we introduce an adaptive prompt orchestration mechanism that dynamically aggregates these prompts conditioned on current observations. This enables the agent to adaptively construct optimal state representations by identifying dominant domain factors instantaneously. Consequently, the policy optimization is effectively shielded from irrelevant interference, preventing the common issue of overfitting to source domains. Extensive experiments demonstrate that CAPO significantly outperforms state-of-the-art baselines in sample efficiency and asymptotic performance. Crucially, it exhibits superior zero-shot adaptation across unseen target domains characterized by drastic environmental (e.g., illumination) and physical shifts (e.g., field-of-view and rotation), validating its effectiveness as a viable solution for cross-embodiment visuomotor policy adaptation.
翻译:为具身智能体学习自适应视觉运动策略仍是一项艰巨挑战,尤其在面临跨具身差异(如多样的传感器配置与动态特性)时。传统学习方法往往难以将任务相关特征与领域特定变化(如光照、视野和旋转)分离,导致样本效率低下及在未见环境中灾难性失效。为弥合这一差距,我们提出对比提示编排(CAPO),一种融合对比提示学习与自适应提示编排的视觉运动策略学习新方法。在提示学习方面,我们设计了一种融合视觉、时序动作与文本目标的混合对比学习策略,以构建可学习提示池,其中每个提示诱导出封装细粒度领域因素的视觉表征。基于这些习得的提示,我们引入一种自适应提示编排机制,该机制能根据当前观测动态聚合这些提示。这使得智能体能够通过即时识别主导领域因素,自适应地构建最优状态表征。因此,策略优化能有效屏蔽无关干扰,避免对源域过拟合的常见问题。大量实验表明,CAPO在样本效率与渐进性能上显著优于现有先进基线方法。关键的是,它在具有剧烈环境变化(如光照)与物理参数偏移(如视野和旋转)的未见目标域上展现出卓越的零样本适应能力,验证了其作为跨具身视觉运动策略适应可行解决方案的有效性。