Unlike current deep keypoint detectors that are trained to recognize limited number of body parts, few-shot keypoint detection (FSKD) attempts to localize any keypoints, including novel or base keypoints, depending on the reference samples. FSKD requires the semantically meaningful relations for keypoint similarity learning to overcome the ubiquitous noise and ambiguous local patterns. One rescue comes with vision transformer (ViT) as it captures long-range relations well. However, ViT may model irrelevant features outside of the region of interest due to the global attention matrix, thus degrading similarity learning between support and query features. In this paper, we present a novel saliency-guided vision transformer, dubbed SalViT, for few-shot keypoint detection. Our SalViT enjoys a uniquely designed masked self-attention and a morphology learner, where the former introduces saliency map as a soft mask to constrain the self-attention on foregrounds, while the latter leverages the so-called power normalization to adjust morphology of saliency map, realizing ``dynamically changing receptive field''. Moreover, as salinecy detectors add computations, we show that attentive masks of DINO transformer can replace saliency. On top of SalViT, we also investigate i) transductive FSKD that enhances keypoint representations with unlabelled data and ii) FSKD under occlusions. We show that our model performs well on five public datasets and achieves ~10% PCK higher than the normally trained model under severe occlusions.
翻译:与当前仅能识别有限身体部位的深度关键点检测器不同,小样本关键点检测(FSKD)旨在根据参考样本定位任意关键点,包括新颖或基础关键点。FSKD需要语义上有意义的关系来学习关键点相似性,以克服普遍存在的噪声和模糊的局部模式。视觉Transformer(ViT)因其能够很好地捕获长距离关系而成为一种解决方案。然而,ViT可能因全局注意力矩阵而建模感兴趣区域之外的不相关特征,从而降低支持特征与查询特征之间的相似性学习。本文提出了一种名为SalViT的新型显著性引导视觉Transformer,用于小样本关键点检测。我们的SalViT采用了独特设计的掩码自注意力机制和形态学习器,前者引入显著性图作为软掩码,将自注意力限制在前景区域,后者利用所谓的幂归一化调整显著性图的形态,实现“动态变化的感受野”。此外,由于显著性检测器会增加计算量,我们证明DINO Transformer的注意力掩码可以替代显著性图。在SalViT的基础上,我们还研究了i) 利用未标注数据增强关键点表示的直推式FSKD,以及ii) 遮挡条件下的FSKD。实验表明,我们的模型在五个公开数据集上表现良好,并在严重遮挡条件下比常规训练模型高出约10%的PCK指标。