Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though less dependent on extensive manual input, still requires necessary support images with annotation for reference during testing. To realize zero-shot keypoint detection without any prior annotation, we introduce the Open-Vocabulary Keypoint Detection (OVKD) task, which is innovatively designed to use text prompts for identifying arbitrary keypoints across any species. In pursuit of this goal, we have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM). This framework synergistically combines vision and language models, creating an interplay between language features and local keypoint visual features. KDSM enhances its capabilities by integrating Domain Distribution Matrix Matching (DDMM) and other special modules, such as the Vision-Keypoint Relational Awareness (VKRA) module, improving the framework's generalizability and overall performance.Our comprehensive experiments demonstrate that KDSM significantly outperforms the baseline in terms of performance and achieves remarkable success in the OVKD task.Impressively, our method, operating in a zero-shot fashion, still yields results comparable to state-of-the-art few-shot species class-agnostic keypoint detection methods.We will make the source code publicly accessible.
翻译:当前基于图像的动物(包括人类)身体和面部关键点检测方法主要分为全监督和少样本类别不可知两类方法。前者通常依赖于耗时费力的人工标注,在将关键点检测扩展到更广泛的关键点类别和动物物种方面面临巨大挑战;后者虽然减少了对大量人工输入的依赖,但在测试时仍需参考带有标注的必要支持图像。为了实现无需任何先验标注的零样本关键点检测,我们提出开放词汇关键点检测(OVKD)任务,该任务创新性地设计使用文本提示来识别任意物种中的任意关键点。为此,我们开发了一种名为"开放词汇关键点检测与语义特征匹配"(KDSM)的新型框架。该框架协同融合视觉和语言模型,在语言特征与局部关键点视觉特征之间建立交互机制。KDSM通过集成域分布矩阵匹配(DDMM)及其他专用模块(如视觉-关键点关系感知模块(VKRA))增强自身能力,提升了框架的泛化性和整体性能。我们的综合实验表明,KDSM在性能上显著优于基线方法,并在OVKD任务中取得了显著成功。令人印象深刻的是,我们的方法以零样本方式运行,其结果仍能与最先进的少样本物种类别不可知关键点检测方法相媲美。我们将公开源代码。