Air-ground robots (AGRs) are widely used in surveillance and disaster response due to their exceptional mobility and versatility (i.e., flying and driving). Current AGR navigation systems perform well in static occlusion-prone environments (e.g., indoors) by using 3D semantic occupancy networks to predict occlusions for complete local mapping and then computing Euclidean Signed Distance Field (ESDF) for path planning. However, these systems face challenges in dynamic, severe occlusion scenes (e.g., crowds) due to limitations in perception networks' low prediction accuracy and path planners' high computation overhead. In this paper, we propose OMEGA, which contains OccMamba with an Efficient AGR-Planner to address the above-mentioned problems. OccMamba adopts a novel architecture that separates semantic and occupancy prediction into independent branches, incorporating two mamba blocks within these branches. These blocks efficiently extract semantic and geometric features in 3D environments with linear complexity, ensuring that the network can learn long-distance dependencies to improve prediction accuracy. Semantic and geometric features are combined within the Bird's Eye View (BEV) space to minimise computational overhead during feature fusion. The resulting semantic occupancy map is then seamlessly integrated into the local map, providing occlusion awareness of the dynamic environment. Our AGR-Planner utilizes this local map and employs kinodynamic A* search and gradient-based trajectory optimization to guarantee planning is ESDF-free and energy-efficient. Extensive experiments demonstrate that OccMamba outperforms the state-of-the-art 3D semantic occupancy network with 25.0% mIoU. End-to-end navigation experiments in dynamic scenes verify OMEGA's efficiency, achieving a 96% average planning success rate. Code and video are available at https://jmwang0117.github.io/OMEGA/.
翻译:空地机器人(AGRs)凭借其卓越的机动性和多功能性(即飞行与地面行驶),在监视和灾难响应中得到广泛应用。当前的AGR导航系统在静态且易遮挡的环境(如室内)中表现良好,其通过使用3D语义占据网络预测遮挡以构建完整的局部地图,然后计算欧几里得符号距离场(ESDF)进行路径规划。然而,在动态、严重遮挡的场景(如人群)中,由于感知网络预测精度低和路径规划器计算开销大的限制,这些系统面临挑战。本文提出OMEGA,它包含OccMamba和一个高效的AGR规划器,以解决上述问题。OccMamba采用一种新颖的架构,将语义预测和占据预测分离为独立分支,并在这些分支中引入两个Mamba模块。这些模块以线性复杂度高效提取3D环境中的语义和几何特征,确保网络能够学习长距离依赖关系以提高预测精度。语义和几何特征在鸟瞰图(BEV)空间内融合,以最小化特征融合过程中的计算开销。生成的语义占据图随后被无缝集成到局部地图中,提供对动态环境的遮挡感知。我们的AGR规划器利用该局部地图,并采用运动学动力学A*搜索和基于梯度的轨迹优化,以确保规划过程无需ESDF且能量高效。大量实验表明,OccMamba在mIoU指标上以25.0%的优势超越了当前最先进的3D语义占据网络。在动态场景中的端到端导航实验验证了OMEGA的高效性,实现了96%的平均规划成功率。代码和视频可在 https://jmwang0117.github.io/OMEGA/ 获取。