Face analysis tasks have a wide range of applications, but the universal facial representation has only been explored in a few works. In this paper, we explore high-performance pre-training methods to boost the face analysis tasks such as face alignment and face parsing. We propose a self-supervised pre-training framework, called \textbf{\it Mask Contrastive Face (MCF)}, with mask image modeling and a contrastive strategy specially adjusted for face domain tasks. To improve the facial representation quality, we use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling. To handle the face identity during the pre-training stage, we further use random masks to build contrastive learning pairs. We conduct the pre-training on the LAION-FACE-cropped dataset, a variants of LAION-FACE 20M, which contains more than 20 million face images from Internet websites. For efficiency pre-training, we explore our framework pre-training performance on a small part of LAION-FACE-cropped and verify the superiority with different pre-training settings. Our model pre-trained with the full pre-training dataset outperforms the state-of-the-art methods on multiple downstream tasks. Our model achieves 0.932 NME$_{diag}$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing. Code is available at https://github.com/nomewang/MCF.
翻译:人脸分析任务具有广泛的应用场景,但目前仅有少量工作探索了通用面部表征。本文探究高性能预训练方法以提升人脸对齐与人脸解析等分析任务。我们提出名为\textbf{\it 掩码对比人脸(MCF)}的自监督预训练框架,该框架结合了掩码图像建模及专为人脸域任务调整的对比策略。为提升面部表征质量,我们采用预训练视觉骨干网络的特征图作为监督项,并利用部分预训练解码器进行掩码图像建模。为在预训练阶段处理人脸身份信息,我们进一步使用随机掩码构建对比学习对。我们在LAION-FACE-cropped数据集(LAION-FACE 20M的变体,包含来自互联网网站的超过2000万张人脸图像)上进行预训练。为高效预训练,我们在LAION-FACE-cropped的小规模子集上探索框架的预训练性能,并通过不同预训练设置验证其优越性。使用完整预训练数据集训练的模型在多项下游任务中均优于现有最优方法。在AFLW-19人脸对齐任务上,我们的模型达到0.932 NME$_{diag}$,在LaPa人脸解析任务上取得93.96 F1得分。代码开源地址:https://github.com/nomewang/MCF。