The alignment of autonomous agents with human values is a pivotal challenge when deploying these agents within physical environments, where safety is an important concern. However, defining the agent's objective as a reward and/or cost function is inherently complex and prone to human errors. In response to this challenge, we present a novel approach that leverages one-class decision trees to facilitate learning from expert demonstrations. These decision trees provide a foundation for representing a set of constraints pertinent to the given environment as a logical formula in disjunctive normal form. The learned constraints are subsequently employed within an oracle constrained reinforcement learning framework, enabling the acquisition of a safe policy. In contrast to other methods, our approach offers an interpretable representation of the constraints, a vital feature in safety-critical environments. To validate the effectiveness of our proposed method, we conduct experiments in synthetic benchmark domains and a realistic driving environment.
翻译:自主智能体与人类价值观的对齐是在物理环境中部署这些智能体时面临的关键挑战,其中安全性是一个重要关注点。然而,将智能体的目标定义为奖励和/或成本函数本身就具有内在复杂性,且容易受到人为错误的影响。针对这一挑战,我们提出了一种新颖方法,利用单类决策树来促进从专家示范中学习。这些决策树为将给定环境中的一组相关约束表示为析取范式的逻辑公式提供了基础。随后,所学习的约束被应用于一个基于预言机的约束强化学习框架中,从而使得能够获取安全策略。与其他方法相比,我们的方法提供了约束的可解释表示,这是在安全关键环境中至关重要的特性。为了验证所提出方法的有效性,我们在合成基准领域和逼真的驾驶环境中进行了实验。