Achieving precise visual localization in GPS-limited urban environments poses significant challenges for resource-constrained mobile platforms, particularly under strict bandwidth, memory, and processing limitations. Inspired by mammalian spatial cognition, we propose a task-oriented communication framework in which bandwidth-limited endpoints equipped with multi-camera systems extract compact multi-view features and offload localization tasks to collaborative edge servers. We introduce the Orthogonally-constrained Variational Information Bottleneck encoder (O-VIB), which incorporates automatic relevance determination (ARD) to prune non-informative features while enforcing orthogonality to minimize redundancy. This enables efficient and accurate localization with minimal transmission overhead. Extensive evaluation on a real-world urban localization dataset demonstrates that O-VIB achieves high-precision localization under stringent bandwidth budgets, outperforming existing methods across diverse communication constraints.
翻译:在GPS受限的城市环境中实现精确的视觉定位,对于资源受限的移动平台(尤其是在严格的带宽、内存和处理能力限制下)构成了重大挑战。受哺乳动物空间认知的启发,我们提出了一种任务导向型通信框架:配备多相机系统的带宽受限终端提取紧凑的多视角特征,并将定位任务卸载至协作的边缘服务器。我们引入了正交约束变分信息瓶颈编码器(O-VIB),该编码器结合自动相关性确定(ARD)来剪除非信息性特征,同时强制正交性以最小化冗余。这使得在最小传输开销下实现高效、精确的定位成为可能。在真实世界城市定位数据集上的广泛评估表明,O-VIB在严格的带宽预算下实现了高精度定位,在各种通信约束条件下均优于现有方法。