Independently Keypoint Learning for Small Object Semantic Correspondence

Semantic correspondence remains a challenging task for establishing correspondences between a pair of images with the same category or similar scenes due to the large intra-class appearance. In this paper, we introduce a novel problem called 'Small Object Semantic Correspondence (SOSC).' This problem is challenging due to the close proximity of keypoints associated with small objects, which results in the fusion of these respective features. It is difficult to identify the corresponding key points of the fused features, and it is also difficult to be recognized. To address this challenge, we propose the Keypoint Bounding box-centered Cropping (KBC) method, which aims to increase the spatial separation between keypoints of small objects, thereby facilitating independent learning of these keypoints. The KBC method is seamlessly integrated into our proposed inference pipeline and can be easily incorporated into other methodologies, resulting in significant performance enhancements. Additionally, we introduce a novel framework, named KBCNet, which serves as our baseline model. KBCNet comprises a Cross-Scale Feature Alignment (CSFA) module and an efficient 4D convolutional decoder. The CSFA module is designed to align multi-scale features, enriching keypoint representations by integrating fine-grained features and deep semantic features. Meanwhile, the 4D convolutional decoder, based on efficient 4D convolution, ensures efficiency and rapid convergence. To empirically validate the effectiveness of our proposed methodology, extensive experiments are conducted on three widely used benchmarks: PF-PASCAL, PF-WILLOW, and SPair-71k. Our KBC method demonstrates a substantial performance improvement of 7.5\% on the SPair-71K dataset, providing compelling evidence of its efficacy.

翻译：语义对应仍然是建立同一类别或相似场景图像对之间对应关系的具有挑战性的任务，这主要由于类别内部巨大的外观差异。本文提出一个名为“小物体语义对应”（Small Object Semantic Correspondence, SOSC）的新问题。由于小物体关键点之间距离过近，导致其特征融合，使得对应关键点难以识别和区分，因此该问题极具挑战性。为解决这一难题，我们提出“关键点边界框中心裁剪”（Keypoint Bounding box-centered Cropping, KBC）方法，旨在增加小物体关键点之间的空间间隔，从而促进这些关键点的独立学习。KBC方法无缝集成到我们提出的推理流程中，并可轻松嵌入其他方法，显著提升性能。此外，我们引入一个名为KBCNet的新框架作为基线模型。KBCNet包含跨尺度特征对齐（Cross-Scale Feature Alignment, CSFA）模块和高效的四维卷积解码器。CSFA模块通过整合细粒度特征与深度语义特征来对齐多尺度特征，丰富关键点表示；而基于高效四维卷积的解码器则确保模型的效率和快速收敛。为验证所提方法的有效性，我们在三个广泛使用的基准数据集（PF-PASCAL、PF-WILLOW和SPair-71k）上进行了大量实验。我们的KBC方法在SPair-71k数据集上实现了7.5%的性能提升，为其有效性提供了有力证据。