We present the problem of inverse constraint learning (ICL), which recovers constraints from demonstrations to autonomously reproduce constrained skills in new scenarios. However, ICL suffers from an ill-posed nature, leading to inaccurate inference of constraints from demonstrations. To figure it out, we introduce a transferable constraint learning (TCL) algorithm that jointly infers a task-oriented reward and a task-agnostic constraint, enabling the generalization of learned skills. Our method TCL additively decomposes the overall reward into a task reward and its residual as soft constraints, maximizing policy divergence between task- and constraint-oriented policies to obtain a transferable constraint. Evaluating our method and five baselines in three simulated environments, we show TCL outperforms state-of-the-art IRL and ICL algorithms, achieving up to a $72\%$ higher task-success rates with accurate decomposition compared to the next best approach in novel scenarios. Further, we demonstrate the robustness of TCL on two real-world robotic tasks.
翻译:我们提出了逆向约束学习(ICL)问题,该问题从演示中恢复约束,以便在新情境中自主复现受约束的技能。然而,ICL存在不适定性,导致从演示中推断约束不准确。为解决这一问题,我们引入了一种可迁移约束学习(TCL)算法,该算法联合推断面向任务的奖励和与任务无关的约束,从而实现所学技能的泛化。我们的方法TCL将总奖励加性分解为任务奖励及其残差(作为软约束),最大化面向任务策略与面向约束策略之间的策略散度,从而获得可迁移的约束。在三个模拟环境中评估我们的方法与五种基线,我们展示了TCL优于最先进的IRL和ICL算法,在新颖场景下,与次优方法相比,任务成功率最高提升$72\%$,且分解准确。此外,我们在两个真实机器人任务中验证了TCL的鲁棒性。