Recent advancements in Vision-Language-Action (VLA) models have demonstrated impressive generalist capabilities in robot manipulation, yet these policies can be brittle under out-of-distribution spatial and semantic perturbations. While human teleoperation offers reliable recovery, it can demand high cognitive load and precise manual control, and existing policy steering methods often require auxiliary models or sampler modifications. In this work, we introduce Shared Autonomy for Policy Steering (SAPS), a framework that blends real-time human teleoperation commands with pretrained policy actions at the action level. SAPS requires no policy retraining, auxiliary dynamics models, or architectural modifications. We propose and evaluate three arbitration strategies to balance human and VLA policy control, including a dynamic Cosine-similarity arbitration strategy that computes the geometric agreement between human and policy actions. Across evaluations in simulation (LIBERO, LIBERO-PRO, CALVIN) and on real-world robot hardware, SAPS improves task success rates over autonomous execution by up to 82% in both simulation and the real world. Furthermore, our approach drastically reduces human intervention compared to pure teleoperation, while simultaneously achieving faster task completion times than both autonomous execution and pure teleoperation. These results demonstrate that action-level shared autonomy is a practical, model-agnostic approach for reliably deploying generalist robot policies in real-world contexts involving a human operator,with promising applications in assistive teleoperation and scalable data collection.
翻译:近年来,视觉-语言-动作(VLA)模型在机器人操作任务中展现出卓越的通用能力,但这些策略在面临分布外空间与语义扰动时可能表现出脆弱性。虽然人类遥操作能够提供可靠的恢复机制,但需要较高的认知负荷与精确的手动控制,而现有策略引导方法往往需要辅助模型或采样器修改。本文提出策略引导共享自主(SAPS)框架,该框架在动作层面将实时人类遥操作指令与预训练策略动作进行混合。SAPS无需策略重新训练、辅助动力学模型或架构修改。我们提出并评估了三种仲裁策略以平衡人类与VLA策略控制,其中包括一种动态余弦相似度仲裁策略,该策略通过计算人类动作与策略动作之间的几何一致性进行仲裁。在仿真环境(LIBERO、LIBERO-PRO、CALVIN)与真实机器人硬件上的评估表明,SAPS在仿真和真实世界中相比自主执行可将任务成功率提升高达82%。此外,与纯遥操作相比,我们的方法大幅减少了人工干预,同时任务完成速度比自主执行和纯遥操作更快。这些结果表明,动作级共享自主是一种实用且与模型无关的方法,可在涉及人类操作员的真实场景中可靠部署通用机器人策略,并在辅助遥操作与可扩展数据采集方面具有广阔应用前景。