We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input, and uses an MLP-based self-boosting stage and a Transformer-based cross-boosting stage to enhance the descriptors. The boosted descriptors can be either real-valued or binary ones. We use the proposed network to boost both hand-crafted (ORB, SIFT) and the state-of-the-art learning-based descriptors (SuperPoint, ALIKE) and evaluate them on image matching, visual localization, and structure-from-motion tasks. The results show that our method significantly improves the performance of each task, particularly in challenging cases such as large illumination changes or repetitive patterns. Our method requires only 3.2ms on desktop GPU and 27ms on embedded GPU to process 2000 features, which is fast enough to be applied to a practical system. The code and trained weights are publicly available at github.com/SJTU-ViSYS/FeatureBooster.
翻译:我们提出了一种轻量级网络,用于改善同一图像内关键点的描述符。该网络以原始描述符和关键点的几何属性作为输入,并利用基于多层感知机的自增强阶段和基于Transformer的交叉增强阶段来提升描述符质量。增强后的描述符可以是实值型或二进制型。我们使用该网络对人工设计的描述符(ORB、SIFT)和当前最先进的基于学习的描述符(SuperPoint、ALIKE)进行增强,并在图像匹配、视觉定位和运动恢复结构任务上评估其性能。结果表明,我们的方法显著提升了各项任务的表现,尤其在光照剧烈变化或重复纹理等挑战性场景中。该方法在桌面级GPU上处理2000个特征仅需3.2毫秒,在嵌入式GPU上需27毫秒,足以满足实际系统应用。代码与预训练权重已开源至github.com/SJTU-ViSYS/FeatureBooster。