Many real-world manipulation scenarios, such as handling complex collaborative tasks and dealing with large workspaces, require coordination of more than two robotic arms. Consequently, an effective multi-arm teleoperation system is required to collect demonstrations for training coordinated multi-arm manipulation policies. However, existing teleoperation frameworks mainly focus on single-operator or multi-operator setups, facing a practical trade-off between the cognitive load placed on a single operator and the coordination cost incurred by multiple operators. To address this problem, we introduce HATS, a human-agent teleoperation system that enables a single human operator, assisted by an MLLM-based agent, to collect data for multi-arm manipulation tasks. Our system decouples the control space: two primary arms are directly teleoperated by the human, while two assistive arms are controlled by a training-free agent that handles sub-tasks. In addition, the human operator can use voice commands to prevent collisions and correct assistive arm behaviors during execution. Extensive evaluations demonstrate that HATS achieves data collection efficiency and success rates comparable to expert dual-human teams. Moreover, downstream policy evaluations demonstrate the efficacy and quality of the data collected through HATS.
翻译:许多真实世界的操作场景,例如处理复杂协作任务和应对大规模工作空间,都需要协调两个以上的机械臂。因此,需要高效的多臂遥操作系统来收集示范数据,以训练协调的多臂操作策略。然而,现有的遥操作框架主要专注于单操作员或多操作员设置,面临单操作员认知负荷与多操作员协调成本之间的实际权衡。为解决该问题,我们提出了HATS——一种人机协同遥操作系统,它使单个人类操作员能够在基于多模态大语言模型(MLLM)的智能体辅助下,收集多臂操作任务数据。该系统将控制空间解耦:两个主臂由人类直接遥操作,而两个辅助臂则由无需训练即可处理子任务的智能体控制。此外,人类操作员可在执行过程中通过语音指令预防碰撞并纠正辅助臂行为。大量评估表明,HATS在数据收集效率和成功率方面与专家双人团队相当。下游策略评估进一步验证了通过HATS收集的数据的有效性和质量。