Supervised keypoint localization methods rely on large manually labeled image datasets, where objects can deform, articulate, or occlude. However, creating such large keypoint labels is time-consuming and costly, and is often error-prone due to inconsistent labeling. Thus, we desire an approach that can learn keypoint localization with fewer yet consistently annotated images. To this end, we present a novel formulation that learns to localize semantically consistent keypoint definitions, even for occluded regions, for varying object categories. We use a few user-labeled 2D images as input examples, which are extended via self-supervision using a larger unlabeled dataset. Unlike unsupervised methods, the few-shot images act as semantic shape constraints for object localization. Furthermore, we introduce 3D geometry-aware constraints to uplift keypoints, achieving more accurate 2D localization. Our general-purpose formulation paves the way for semantically conditioned generative modeling and attains competitive or state-of-the-art accuracy on several datasets, including human faces, eyes, animals, cars, and never-before-seen mouth interior (teeth) localization tasks, not attempted by the previous few-shot methods. Project page: https://xingzhehe.github.io/FewShot3DKP/}{https://xingzhehe.github.io/FewShot3DKP/
翻译:监督式关键点定位方法依赖大量人工标注的图像数据集,其中物体可能发生形变、关节运动或遮挡。然而,创建此类大规模关键点标注既耗时又昂贵,且常因标注不一致而产生错误。为此,我们期望一种能够利用较少但一致性标注图像学习关键点定位的方法。基于此,我们提出一种新颖的公式化方法,可学习定位语义一致的关键点定义(即使对于遮挡区域),适用于不同物体类别。我们使用少量用户标注的二维图像作为输入样本,并通过自监督方式利用更大的未标注数据集进行扩展。与无监督方法不同,少样本图像作为物体定位的语义形状约束。此外,我们引入三维几何感知约束来提升关键点表征,从而实现更精确的二维定位。我们的通用公式化为语义条件生成建模铺平了道路,并在多个数据集上达到具有竞争力或最先进的精度,包括人脸、人眼、动物、汽车以及先前少样本方法未曾尝试的口腔内部(牙齿)定位任务。项目页面:https://xingzhehe.github.io/FewShot3DKP/