Camera relocalization relies on 3D models of the scene with a large memory footprint that is incompatible with the memory budget of several applications. One solution to reduce the scene memory size is map compression by removing certain 3D points and descriptor quantization. This achieves high compression but leads to performance drop due to information loss. To address the memory performance trade-off, we train a light-weight scene-specific auto-encoder network that performs descriptor quantization-dequantization in an end-to-end differentiable manner updating both product quantization centroids and network parameters through back-propagation. In addition to optimizing the network for descriptor reconstruction, we encourage it to preserve the descriptor-matching performance with margin-based metric loss functions. Results show that for a local descriptor memory of only 1MB, the synergistic combination of the proposed network and map compression achieves the best performance on the Aachen Day-Night compared to existing compression methods.
翻译:相机重定位依赖于场景的三维模型,其庞大的内存占用与多种应用的内存预算不相容。减少场景内存大小的一种解决方案是通过移除特定三维点及描述符量化进行地图压缩。该方法可实现高压缩率,但会因信息损失导致性能下降。为解决内存与性能的权衡问题,我们训练了一个轻量级的场景特定自编码器网络,该网络以端到端可微分的方式执行描述符量化-反量化,通过反向传播同时更新乘积量化质心与网络参数。除了优化网络的描述符重建能力外,我们还通过基于间隔的度量损失函数促使网络保持描述符匹配性能。结果表明,在局部描述符内存仅为1MB的条件下,所提网络与地图压缩的协同组合在Aachen Day-Night数据集上取得了优于现有压缩方法的最佳性能。