Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.
翻译:几何导航如今已成为机器人领域的成熟学科,研究重点正转向更高层次的场景理解,例如语义映射。当机器人需要与环境交互时,它必须能够理解周围环境的上下文信息。本文聚焦于在正在构建(SLAM)或已构建的图中对物体进行分类和定位。为进一步探索该方向,我们提出了一种框架,能利用多模态传感器融合方法(结合RGB-D摄像头的RGB与深度数据及激光雷达),在已知环境中自主检测并定位预定义物体。该框架包含三个关键要素:通过RGB数据理解环境、通过多模态传感器融合估算深度,以及构件管理(即滤波与测量值稳定化)。实验表明,所提出的框架无需后处理即可在真实样本环境中准确检测98%的物体,而仅使用单一RGBD摄像头或RGB+激光雷达配置时,分别能映射85%和80%的物体。通过与单传感器(摄像头或激光雷达)实验的对比表明:传感器融合使机器人能精确检测近距与远距离障碍物,而纯视觉或纯激光方法在检测这些障碍物时可能会出现噪声大或不精确的问题。