The existence of variable factors within the environment can cause a decline in camera localization accuracy, as it violates the fundamental assumption of a static environment in Simultaneous Localization and Mapping (SLAM) algorithms. Recent semantic SLAM systems towards dynamic environments either rely solely on 2D semantic information, or solely on geometric information, or combine their results in a loosely integrated manner. In this research paper, we introduce 3DS-SLAM, 3D Semantic SLAM, tailored for dynamic scenes with visual 3D object detection. The 3DS-SLAM is a tightly-coupled algorithm resolving both semantic and geometric constraints sequentially. We designed a 3D part-aware hybrid transformer for point cloud-based object detection to identify dynamic objects. Subsequently, we propose a dynamic feature filter based on HDBSCAN clustering to extract objects with significant absolute depth differences. When compared against ORB-SLAM2, 3DS-SLAM exhibits an average improvement of 98.01% across the dynamic sequences of the TUM RGB-D dataset. Furthermore, it surpasses the performance of the other four leading SLAM systems designed for dynamic environments.
翻译:环境中可变因素的存在会导致相机定位精度下降,因为这违反了同步定位与地图构建(SLAM)算法的静态环境基本假设。近期针对动态环境的语义SLAM系统要么仅依赖2D语义信息,要么仅依赖几何信息,或是以松散耦合的方式结合两者的结果。在本研究论文中,我们提出了针对动态场景的3DS-SLAM(3D语义SLAM),该方法融合了视觉3D目标检测技术。3DS-SLAM是一种紧密耦合算法,能够顺序解决语义和几何约束问题。我们设计了一种基于点云目标检测的三维部分感知混合Transformer来识别动态物体。随后,提出了一种基于HDBSCAN聚类的动态特征滤波器,用于提取具有显著绝对深度差异的物体。与ORB-SLAM2相比,3DS-SLAM在TUM RGB-D数据集动态序列上的平均性能提升了98.01%。此外,它在四种面向动态环境的领先SLAM系统中也展现出更优的性能。