This paper presents a novel approach to learning from demonstration that enables robots to autonomously execute complex tasks in dynamic environments. We model latent tasks as probabilistic formal languages and introduce a tailored reactive synthesis framework that balances robot costs with user task preferences. Our methodology focuses on safety-constrained learning and inferring formal task specifications as Probabilistic Deterministic Finite Automata (PDFA). We adapt existing evidence-driven state merging algorithms and incorporate safety requirements throughout the learning process to ensure that the learned PDFA always complies with safety constraints. Furthermore, we introduce a multi-objective reactive synthesis algorithm that generates deterministic strategies that are guaranteed to satisfy the PDFA task while optimizing the trade-offs between user preferences and robot costs, resulting in a Pareto front of optimal solutions. Our approach models the interaction as a two-player game between the robot and the environment, accounting for dynamic changes. We present a computationally-tractable value iteration algorithm to generate the Pareto front and the corresponding deterministic strategies. Comprehensive experimental results demonstrate the effectiveness of our algorithms across various robots and tasks, showing that the learned PDFA never includes unsafe behaviors and that synthesized strategies consistently achieve the task while meeting both the robot cost and user-preference requirements.
翻译:本文提出一种新颖的从演示中学习的方法,使机器人能够在动态环境中自主执行复杂任务。我们将潜在任务建模为概率形式语言,并引入一种定制的响应式综合框架,以平衡机器人成本与用户任务偏好。我们的方法聚焦于安全约束下的学习,并将形式化任务规约推断为概率确定性有限自动机(PDFA)。我们改进现有的证据驱动状态合并算法,并在整个学习过程中融入安全要求,确保学习得到的PDFA始终符合安全约束。此外,我们提出一种多目标响应式综合算法,该算法生成确定性策略,在保证满足PDFA任务的同时,优化用户偏好与机器人成本之间的权衡,从而得到帕累托最优解集。我们的方法将交互建模为机器人与环境之间的双人博弈,以应对动态变化。我们提出一种计算可行的值迭代算法来生成帕累托前沿及相应的确定性策略。综合实验结果表明,我们的算法在不同机器人和任务中均表现出有效性:学习得到的PDFA从不包含不安全行为,且综合生成的策略在满足机器人成本与用户偏好要求的同时,始终能成功完成任务。