Model pre-training has become essential in various recognition tasks. Meanwhile, with the remarkable advancements in image generation models, pre-training methods utilizing generated images have also emerged given their ability to produce unlimited training data. However, while existing methods utilizing generated images excel in classification, they fall short in more practical tasks, such as human pose estimation. In this paper, we have experimentally demonstrated it and propose the generation of visually distinct images with identical human poses. We then propose a novel multi-positive contrastive learning, which optimally utilize the previously generated images to learn structural features of the human body. We term the entire learning pipeline as GenPoCCL. Despite using only less than 1% amount of data compared to current state-of-the-art method, GenPoCCL captures structural features of the human body more effectively, surpassing existing methods in a variety of human-centric perception tasks.
翻译:模型预训练已成为各类识别任务中不可或缺的一环。与此同时,随着图像生成模型取得显著进展,利用生成图像进行预训练的方法因其可产生无限训练数据的能力而相继涌现。然而,现有基于生成图像的方法虽在分类任务中表现优异,但在诸如人体姿态估计等更具实际应用价值的任务中却效果欠佳。本文通过实验验证了上述现象,并提出生成具有相同人体姿态但视觉外观各异的图像。进而,我们提出了一种新颖的多正例对比学习方法,该方法能最优地利用先前生成的图像,以学习人体结构特征。我们将整个学习流程命名为GenPoCCL。尽管所使用的数据量不足当前最先进方法的1%,GenPoCCL却能更有效地捕捉人体结构特征,并在多种以人为中心的感知任务中超越现有方法。