Event stream-based Visual Place Recognition (VPR) is an emerging research direction that offers a compelling solution to the instability of conventional visible-light cameras under challenging conditions such as low illumination, overexposure, and high-speed motion. Recognizing the current scarcity of dedicated datasets in this domain, we introduce EPRBench, a high-quality benchmark specifically designed for event stream-based VPR. EPRBench comprises 10K event sequences and 65K event frames, collected using both handheld and vehicle-mounted setups to comprehensively capture real-world challenges across diverse viewpoints, weather conditions, and lighting scenarios. To support semantic-aware and language-integrated VPR research, we provide LLM-generated scene descriptions, subsequently refined through human annotation, establishing a solid foundation for integrating LLMs into event-based perception pipelines. To facilitate systematic evaluation, we implement and benchmark 15 state-of-the-art VPR algorithms on EPRBench, offering a strong baseline for future algorithmic comparisons. Furthermore, we propose a novel multi-modal fusion paradigm for VPR: leveraging LLMs to generate textual scene descriptions from raw event streams, which then guide spatially attentive token selection, cross-modal feature fusion, and multi-scale representation learning. This framework not only achieves highly accurate place recognition but also produces interpretable reasoning processes alongside its predictions, significantly enhancing model transparency and explainability. The dataset and source code will be released on https://github.com/Event-AHU/Neuromorphic_ReID
翻译:基于事件流的视觉地点识别(VPR)是一个新兴的研究方向,为解决传统可见光相机在低光照、过曝光和高速运动等挑战性条件下的不稳定性问题提供了极具前景的解决方案。鉴于该领域目前缺乏专用数据集,我们推出了EPRBench,这是一个专为基于事件流的VPR设计的高质量基准。EPRBench包含10K个事件序列和65K个事件帧,通过手持和车载两种采集方式获取,全面涵盖了不同视角、天气条件和光照场景下的真实世界挑战。为支持语义感知和语言集成的VPR研究,我们提供了由大语言模型生成并经人工标注精炼的场景描述,为将大语言模型整合到基于事件的感知流程中奠定了坚实基础。为促进系统性评估,我们在EPRBench上实现并基准测试了15种最先进的VPR算法,为未来的算法比较提供了强有力的基线。此外,我们提出了一种新颖的VPR多模态融合范式:利用大语言模型从原始事件流生成文本场景描述,进而指导空间注意力的令牌选择、跨模态特征融合和多尺度表示学习。该框架不仅实现了高精度的地点识别,还能在预测的同时生成可解释的推理过程,显著增强了模型的透明度和可解释性。数据集与源代码将在 https://github.com/Event-AHU/Neuromorphic_ReID 发布。