Control Barrier Functions (CBFs) have been applied to provide safety guarantees for robot navigation. Traditional approaches consider fixed CBFs during navigation and hand-tune the underlying parameters apriori. Such approaches are inefficient and vulnerable to changes in the environment. The goal of this paper is to learn CBFs for multi-robot navigation based on what robots perceive about their environment. In order to guarantee the feasibility of the navigation task, while ensuring robot safety, we pursue a trade-off between conservativeness and aggressiveness in robot behavior by defining dynamic environment-aware CBF constraints. Since the explicit relationship between CBF constraints and navigation performance is challenging to model, we leverage reinforcement learning to learn time-varying CBFs in a model-free manner. We parameterize the CBF policy with graph neural networks (GNNs), and design GNNs that are translation invariant and permutation equivariant, to synthesize decentralized policies that generalize across environments. The proposed approach maintains safety guarantees (due to the underlying CBFs), while optimizing navigation performance (due to the reward-based learning). We perform simulations that compare the proposed approach with fixed CBFs tuned by exhaustive grid-search. The results show that environment-aware CBFs are capable of adapting to robot movements and obstacle changes, yielding improved navigation performance and robust generalization.
翻译:控制屏障函数(CBF)已被应用于为机器人导航提供安全保障。传统方法在导航过程中使用固定CBF,并需提前手动调整底层参数。此类方法效率低下且对环境变化缺乏鲁棒性。本文旨在根据机器人对环境的感知来学习适用于多机器人导航的CBF。为在确保机器人安全的同时保证导航任务的可行性,我们通过定义动态环境感知CBF约束,在机器人行为的保守性与激进性之间寻求权衡。由于CBF约束与导航性能之间的显式关系难以建模,我们利用强化学习以无模型方式学习时变CBF。通过图神经网络(GNN)参数化CBF策略,并设计具有平移不变性和排列等变性的GNN,从而合成可跨环境泛化的分布式策略。所提方法在保持安全保障(基于底层CBF)的同时优化导航性能(基于奖励学习)。我们通过仿真将所提方法与穷举网格搜索调参的固定CBF进行对比。结果表明,环境感知CBF能够适应机器人运动与障碍物变化,显著提升导航性能并实现鲁棒泛化。