Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.
翻译:将规则无缝整合到从示范学习(LfD)策略中,是推动AI智能体实际部署的关键需求。近期研究表明,信号时态逻辑(STL)作为一种以时空约束编码规则的有效语言,具有重要应用价值。本研究采用蒙特卡洛树搜索(MCTS)将STL规范融入原始LfD策略,以提升约束满足能力。我们提出通过STL鲁棒性值增强MCTS启发式函数,引导树搜索偏向约束满足度更高的分支。虽然该领域无关方法可在线集成至任意预训练LfD算法,本研究选择目标条件生成对抗模仿学习作为离线LfD策略。我们将所提方法应用于非管制机场周边通用航空飞行器航迹规划领域。基于真实世界数据训练的仿真器结果显示,相较于未使用STL启发式函数的基线LfD方法,性能提升达60%。