Localization for autonomous robots in prior maps is crucial for their functionality. This paper offers a solution to this problem for indoor environments called InstaLoc, which operates on an individual lidar scan to localize it within a prior map. We draw on inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects and structures. Mimicking the human approach, InstaLoc identifies and matches object instances in the scene with those from a prior map. As far as we know, this is the first method to use panoptic segmentation directly inferring on 3D lidar scans for indoor localization. InstaLoc operates through two networks based on spatially sparse tensors to directly infer dense 3D lidar point clouds. The first network is a panoptic segmentation network that produces object instances and their semantic classes. The second smaller network produces a descriptor for each object instance. A consensus based matching algorithm then matches the instances to the prior map and estimates a six degrees of freedom (DoF) pose for the input cloud in the prior map. The significance of InstaLoc is that it has two efficient networks. It requires only one to two hours of training on a mobile GPU and runs in real-time at 1 Hz. Our method achieves between two and four times more detections when localizing, as compared to baseline methods, and achieves higher precision on these detections.
翻译:自主机器人在先验地图中的定位对其功能至关重要。本文提出了一种名为InstaLoc的室内环境定位解决方案,该方案通过单次激光雷达扫描实现先验地图中的定位。我们借鉴人类通过识别独特物体与结构布局进行导航与定位的方式,模仿人类策略,InstaLoc识别并匹配场景中的物体实例与先验地图中的对应实例。据我们所知,这是首个直接利用3D激光雷达扫描的语义全景分割进行室内定位的方法。InstaLoc通过两个基于空间稀疏张量的网络直接推理稠密3D激光雷达点云:第一个网络为语义全景分割网络,用于生成物体实例及其语义类别;第二个较小网络为每个物体实例生成描述符。随后,基于一致性的匹配算法将实例与先验地图进行匹配,并为输入点云估计出先验地图中的六自由度(DoF)位姿。InstaLoc的意义在于其两个高效网络:仅需在移动GPU上训练一至两小时,即可实现1 Hz的实时运行。与基准方法相比,本方法在定位时检测次数提升2至4倍,且检测精度更高。