Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though less dependent on extensive manual input, still requires necessary support images with annotation for reference during testing. To realize zero-shot keypoint detection without any prior annotation, we introduce the Open-Vocabulary Keypoint Detection (OVKD) task, which is innovatively designed to use text prompts for identifying arbitrary keypoints across any species. In pursuit of this goal, we have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM). This framework synergistically combines vision and language models, creating an interplay between language features and local keypoint visual features. KDSM enhances its capabilities by integrating Domain Distribution Matrix Matching (DDMM) and other special modules, such as the Vision-Keypoint Relational Awareness (VKRA) module, improving the framework's generalizability and overall performance.Our comprehensive experiments demonstrate that KDSM significantly outperforms the baseline in terms of performance and achieves remarkable success in the OVKD task.Impressively, our method, operating in a zero-shot fashion, still yields results comparable to state-of-the-art few-shot species class-agnostic keypoint detection methods.We will make the source code publicly accessible.

翻译：当前基于图像的动物（包括人类）身体和面部关键点检测方法主要分为全监督和少样本类别不可知两类方法。前者通常依赖于耗时费力的人工标注，在将关键点检测扩展到更广泛的关键点类别和动物物种方面面临巨大挑战；后者虽然减少了对大量人工输入的依赖，但在测试时仍需参考带有标注的必要支持图像。为了实现无需任何先验标注的零样本关键点检测，我们提出开放词汇关键点检测（OVKD）任务，该任务创新性地设计使用文本提示来识别任意物种中的任意关键点。为此，我们开发了一种名为"开放词汇关键点检测与语义特征匹配"（KDSM）的新型框架。该框架协同融合视觉和语言模型，在语言特征与局部关键点视觉特征之间建立交互机制。KDSM通过集成域分布矩阵匹配（DDMM）及其他专用模块（如视觉-关键点关系感知模块（VKRA））增强自身能力，提升了框架的泛化性和整体性能。我们的综合实验表明，KDSM在性能上显著优于基线方法，并在OVKD任务中取得了显著成功。令人印象深刻的是，我们的方法以零样本方式运行，其结果仍能与最先进的少样本物种类别不可知关键点检测方法相媲美。我们将公开源代码。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日