Visual Place Recognition (VPR) enables systems to identify previously visited locations within a map, a fundamental task for autonomous navigation. Prior works have developed VPR solutions using event cameras, which asynchronously measure per-pixel brightness changes with microsecond temporal resolution. However, these approaches rely on dense representations of the inherently sparse camera output and require tens to hundreds of milliseconds of event data to predict a place. Here, we break this paradigm with Flash, a lightweight VPR system that predicts places using sub-millisecond slices of event data. Our method is based on the observation that active pixel locations provide strong discriminative features for VPR. Flash encodes these active pixel locations using efficient binary frames and computes similarities via fast bitwise operations, which are then normalized based on the relative event activity in the query and reference frames. Flash improves Recall@1 for sub-millisecond VPR over existing baselines by 11.33x on the indoor QCR-Event-Dataset and 5.92x on the 8 km Brisbane-Event-VPR dataset. Moreover, our approach reduces the duration for which the robot must operate without awareness of its position, as evidenced by a localization latency metric we term Time to Correct Match (TCM). To the best of our knowledge, this is the first work to demonstrate sub-millisecond VPR using event cameras.
翻译:视觉地点识别(VPR)使系统能够在已知地图中识别先前访问过的位置,这是自主导航的一项基本任务。先前的研究已开发出使用事件相机的VPR解决方案,这类相机能以微秒级时间分辨率异步测量每个像素的亮度变化。然而,这些方法依赖于对本质上稀疏的相机输出进行密集表示,并且需要数十到数百毫秒的事件数据来预测地点。本文中,我们通过Flash系统打破了这一范式,这是一个轻量级VPR系统,仅使用亚毫秒级的事件数据切片即可预测地点。我们的方法基于以下观察:活跃像素位置为VPR提供了强大的判别性特征。Flash使用高效的二值帧对这些活跃像素位置进行编码,并通过快速的按位运算计算相似度,然后根据查询帧和参考帧中的相对事件活动度对相似度进行归一化。在室内QCR-Event-Dataset上,Flash将亚毫秒级VPR的Recall@1指标较现有基线提升了11.33倍;在8公里长的Brisbane-Event-VPR数据集上提升了5.92倍。此外,我们的方法减少了机器人必须在未知自身位置状态下运行的持续时间,这通过我们提出的定位延迟度量指标——正确匹配时间(TCM)得以证明。据我们所知,这是首个展示使用事件相机实现亚毫秒级VPR的研究工作。