It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.
翻译:为了使机器人获取安全策略,感知环境中约束的存在至关重要。然而,明确指定环境中的所有约束可能是一项艰巨的任务。现有最先进的约束推断算法从示范中学习约束,但往往计算成本高昂且存在不稳定性问题。本文提出了一种新颖的贝叶斯方法,基于对示范的偏好来推断约束。该方法的主要优势包括:1)无需在每次迭代中计算新策略即可推断约束;2)采用更简单且更现实的示范分组排序,无需对所有示范进行成对比较;3)能够适应不同约束违反程度的情况。实验结果表明,与最先进的约束推断方法相比,本文提出的贝叶斯方法能够更准确地推断出不同严重程度的约束。