Advances in robotic skill acquisition have made it possible to build general-purpose libraries of learned skills for downstream manipulation tasks. However, naively executing these skills one after the other is unlikely to succeed without accounting for dependencies between actions prevalent in long-horizon plans. We present Sequencing Task-Agnostic Policies (STAP), a scalable framework for training manipulation skills and coordinating their geometric dependencies at planning time to solve long-horizon tasks never seen by any skill during training. Given that Q-functions encode a measure of skill feasibility, we formulate an optimization problem to maximize the joint success of all skills sequenced in a plan, which we estimate by the product of their Q-values. Our experiments indicate that this objective function approximates ground truth plan feasibility and, when used as a planning objective, reduces myopic behavior and thereby promotes long-horizon task success. We further demonstrate how STAP can be used for task and motion planning by estimating the geometric feasibility of skill sequences provided by a task planner. We evaluate our approach in simulation and on a real robot. Qualitative results and code are made available at https://sites.google.com/stanford.edu/stap.
翻译:在机器人技能获取方面的进展使得构建用于下游操作任务的通用型习得技能库成为可能。然而,若未考虑长时域规划中动作间的依赖关系,简单顺序执行这些技能往往难以成功。我们提出序列任务无关策略(STAP),这是一个可扩展框架,用于训练操作技能并协调其在规划时的几何依赖关系,以解决训练中任何技能均未见过的长时域任务。鉴于Q函数编码了技能可行性的度量,我们构建了一个优化问题,旨在最大化规划中所有序列技能的联合成功率,并通过其Q值的乘积进行估计。实验表明,该目标函数近似于真实规划可行性,将其作为规划目标可减少短视行为,从而提升长时域任务成功率。我们进一步展示了如何利用STAP进行任务与运动规划:通过估计任务规划器提供的技能序列的几何可行性。我们在仿真环境和真实机器人上评估了该方法。定性结果和代码已发布于 https://sites.google.com/stanford.edu/stap。