Location information is pivotal for the automation and intelligence of terminal devices and edge-cloud IoT systems, such as autonomous vehicles and augmented reality. However, achieving reliable positioning across diverse IoT applications remains challenging due to significant training costs and the necessity of densely collected data. To tackle these issues, we have innovatively applied the selective state space (SSM) model to visual localization, introducing a new model named MambaLoc. The proposed model demonstrates exceptional training efficiency by capitalizing on the SSM model's strengths in efficient feature extraction, rapid computation, and memory optimization, and it further ensures robustness in sparse data environments due to its parameter sparsity. Additionally, we propose the Global Information Selector (GIS), which leverages selective SSM to implicitly achieve the efficient global feature extraction capabilities of Non-local Neural Networks. This design leverages the computational efficiency of the SSM model alongside the Non-local Neural Networks' capacity to capture long-range dependencies with minimal layers. Consequently, the GIS enables effective global information capture while significantly accelerating convergence. Our extensive experimental validation using public indoor and outdoor datasets first demonstrates our model's effectiveness, followed by evidence of its versatility with various existing localization models. Our code and models are publicly available to support further research and development in this area.
翻译:定位信息对于终端设备及边缘云物联网系统(如自动驾驶车辆和增强现实)的自动化与智能化至关重要。然而,由于高昂的训练成本及密集数据采集的必要性,在不同物联网应用中实现可靠定位仍具挑战。为解决这些问题,我们创新性地将选择性状态空间(SSM)模型应用于视觉定位,提出了名为MambaLoc的新模型。该模型充分利用SSM模型在高效特征提取、快速计算和内存优化方面的优势,展现出卓越的训练效率;同时因其参数稀疏性,进一步确保了在稀疏数据环境下的鲁棒性。此外,我们提出了全局信息选择器(GIS),该模块利用选择性SSM隐式实现了非局部神经网络的高效全局特征提取能力。此设计结合了SSM模型的计算效率与非局部神经网络以少量层捕获长程依赖的能力。因此,GIS在显著加速收敛的同时实现了有效的全局信息捕获。我们使用公开室内外数据集进行的广泛实验验证,首先证明了本模型的有效性,随后通过其与多种现有定位模型的兼容性证明了其通用性。我们的代码与模型已公开,以支持该领域的进一步研究与开发。