Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of Tensor Train to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art Reinforcement Learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.
翻译:近期机器人技能学习的进展解锁了构建任务无关技能库的潜力,通过无缝编排多个简单操作基元(即技能)以应对更复杂的任务。然而,当目标仅以最终几何构型而非符号目标的形式给出时,如何确定独立学习技能的最优序列仍是一个开放问题。为应对这一挑战,我们提出逻辑技能编程(LSP)——一种基于优化的方法,通过编排独立学习的技能来解决长时域任务。我们提出了数学规划的一阶扩展形式,以优化规划内所有技能的累积奖励总和,该总和由价值函数之和抽象表示。为解决此类规划问题,我们利用张量列构造价值函数空间,并通过符号搜索与技能价值优化的交替迭代来寻找合适的技能骨架和最优子目标序列。实验结果表明,与最先进的强化学习方法相比,所获价值函数对累积奖励的近似效果更优。此外,我们在三个操作领域(涵盖抓取与非抓取基元)验证了LSP,结果证明其能够在完整逻辑与几何路径上识别最优解。实物机器人实验展示了该方法应对真实世界中接触不确定性与外部干扰的有效性。