Unmanned Aerial Vehicles (UAVs) hold immense potential for critical applications, such as search and rescue operations, where accurate perception of indoor environments is paramount. However, the concurrent amalgamation of localization, 3D reconstruction, and semantic segmentation presents a notable hurdle, especially in the context of UAVs equipped with constrained power and computational resources. This paper presents a novel approach to address challenges in semantic information extraction and utilization within UAV operations. Our system integrates state-of-the-art visual SLAM to estimate a comprehensive 6-DoF pose and advanced object segmentation methods at the back end. To improve the computational and storage efficiency of the framework, we adopt a streamlined voxel-based 3D map representation - OctoMap to build a working system. Furthermore, the fusion algorithm is incorporated to obtain the semantic information of each frame from the front-end SLAM task, and the corresponding point. By leveraging semantic information, our framework enhances the UAV's ability to perceive and navigate through indoor spaces, addressing challenges in pose estimation accuracy and uncertainty reduction. Through Gazebo simulations, we validate the efficacy of our proposed system and successfully embed our approach into a Jetson Xavier AGX unit for real-world applications.
翻译:无人驾驶飞行器(UAV)在搜索救援等关键应用中具有巨大潜力,此类应用对室内环境的精准感知至关重要。然而,在功率和计算资源受限的无人机平台上,同时实现定位、三维重建与语义分割仍面临显著挑战。本文提出了一种创新方法,以解决无人机操作中语义信息提取与利用的难题。本系统集成了最先进的视觉SLAM技术用于估计完整六自由度位姿,并在后端采用高级物体分割方法。为提升框架的计算与存储效率,我们采用精简的体素化三维地图表示——OctoMap来构建工作系统。此外,通过融合算法从前端SLAM任务中获取每帧图像的语义信息及其对应点云。利用语义信息,本框架增强了无人机对室内空间的感知与导航能力,有效解决了位姿估计精度不足与不确定性消除的挑战。通过Gazebo仿真验证了所提系统的有效性,并成功将本方法部署至Jetson Xavier AGX单元以实现实际应用。