Understanding the progress of a task allows humans to not only track what has been done but also to better plan for future goals. We demonstrate TaKSIE, a novel framework that incorporates task progress knowledge into visual subgoal generation for robotic manipulation tasks. We jointly train a recurrent network with a latent diffusion model to generate the next visual subgoal based on the robot's current observation and the input language command. At execution time, the robot leverages a visual progress representation to monitor the task progress and adaptively samples the next visual subgoal from the model to guide the manipulation policy. We train and validate our model in simulated and real-world robotic tasks, achieving state-of-the-art performance on the CALVIN manipulation benchmark. We find that the inclusion of task progress knowledge can improve the robustness of trained policy for different initial robot poses or various movement speeds during demonstrations. The project website can be found at https://live-robotics-uva.github.io/TaKSIE/ .
翻译:理解任务进度不仅能让人类追踪已完成的工作,还能更好地规划未来目标。我们提出了TaKSIE这一新颖框架,它将任务进度知识融入机器人操作任务的视觉子目标生成中。我们联合训练一个循环网络与一个潜在扩散模型,使其能够基于机器人当前观测和输入的语言指令生成下一个视觉子目标。在执行阶段,机器人利用视觉进度表示来监控任务进度,并自适应地从模型中采样下一个视觉子目标,以指导操作策略。我们在模拟和真实世界的机器人任务中训练并验证了我们的模型,在CALVIN操作基准测试中取得了最先进的性能。我们发现,融入任务进度知识可以提高训练策略对于不同初始机器人姿态或演示过程中各种移动速度的鲁棒性。项目网站可在 https://live-robotics-uva.github.io/TaKSIE/ 找到。