In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this paper, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model.Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach allows the agent to adjust a third-person camera to actively observe the environment based on the task goal, and subsequently infer the appropriate manipulation actions.We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.
翻译:在现实场景中,许多机器人操作任务因遮挡和有限视野而受阻,这对依赖固定或腕部安装摄像头的被动观测模型构成了重大挑战。本文研究了有限视觉观测下的机器人操作问题,并提出了一种任务驱动的异步主动视觉-动作模型。该模型将相机最佳下一视点策略与夹爪最佳下一姿态策略串联,并在传感器-运动协调框架中使用少量样本强化学习对二者进行训练。该方法使智能体能够根据任务目标调整第三人称相机以主动观测环境,进而推断出适当的操作动作。我们在RLBench中的8个视点受限任务上训练并评估了该模型。结果表明,我们的模型始终优于基线算法,展现了其在处理操作任务中视觉约束方面的有效性。