We present the problem of inverse constraint learning (ICL), which recovers constraints from demonstrations to autonomously reproduce constrained skills in new scenarios. However, ICL suffers from an ill-posed nature, leading to inaccurate inference of constraints from demonstrations. To figure it out, we introduce a transferable constraint learning (TCL) algorithm that jointly infers a task-oriented reward and a task-agnostic constraint, enabling the generalization of learned skills. Our method TCL additively decomposes the overall reward into a task reward and its residual as soft constraints, maximizing policy divergence between task- and constraint-oriented policies to obtain a transferable constraint. Evaluating our method and four baselines in three simulated environments, we show TCL outperforms state-of-the-art IRL and ICL algorithms, achieving up to a $72\%$ higher task-success rates with accurate decomposition compared to the next best approach in novel scenarios. Further, we demonstrate the robustness of TCL on a real-world robotic tray-carrying task.
翻译:我们提出了逆约束学习(ICL)问题,即从示范中恢复约束,以便在新场景中自主复现受限技能。然而,ICL存在病态特性,导致从示范中推断约束不准确。为解决这一问题,我们引入了一种可迁移约束学习(TCL)算法,该算法联合推断任务导向奖励与任务无关约束,从而实现所学技能的泛化。我们的方法TCL将总奖励加性分解为任务奖励及其残差作为软约束,通过最大化任务导向策略与约束导向策略之间的策略差异来获得可迁移约束。在三个模拟环境中评估我们的方法与四种基线方法,结果表明TCL优于最先进的逆强化学习(IRL)与逆约束学习(ICL)算法,在新场景中相比次优方法实现了高达72%的任务成功率提升,且具有精确的分解能力。此外,我们在真实世界的机器人托盘搬运任务中展示了TCL的鲁棒性。