Semantic grids can be useful representations of the scene around an autonomous system. By having information about the layout of the space around itself, a robot can leverage this type of representation for crucial tasks such as navigation or tracking. By fusing information from multiple sensors, robustness can be increased and the computational load for the task can be lowered, achieving real time performance. Our multi-scale LiDAR-Aided Perspective Transform network uses information available in point clouds to guide the projection of image features to a top-view representation, resulting in a relative improvement in the state of the art for semantic grid generation for human (+8.67%) and movable object (+49.07%) classes in the nuScenes dataset, as well as achieving results close to the state of the art for the vehicle, drivable area and walkway classes, while performing inference at 25 FPS.
翻译:语义网格可作为自主系统周围场景的有效表示。通过获取周围空间布局信息,机器人可利用此类表示完成导航或跟踪等关键任务。融合多传感器信息可提升系统鲁棒性并降低计算负载,实现实时性能。本方法提出的多尺度激光雷达辅助透视变换网络利用点云中的信息引导图像特征向俯视图表示的投影,在nuScenes数据集中,针对行人(+8.67%)和可移动物体(+49.07%)类别的语义网格生成任务实现了当前最优方法的相对提升,同时在车辆、可行驶区域和人行道类别上达到接近最优的结果,并以25FPS的速度完成推理。