Unsupervised learning of facial representations has gained increasing attention for face understanding ability without heavily relying on large-scale annotated datasets. However, it remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. Prior methods primarily focus on 2D factors and pixel-level consistency, leading to incomplete disentangling and suboptimal performance in downstream tasks. In this paper, we propose LatentFace, a novel unsupervised disentangling framework for facial expression and identity representation. We suggest the disentangling problem should be performed in latent space and propose the solution using a 3D-ware latent diffusion model. First, we introduce a 3D-aware autoencoder to encode face images into 3D latent embeddings. Second, we propose a novel representation diffusion model (RDM) to disentangle 3D latent into facial identity and expression. Consequently, our method achieves state-of-the-art performance in facial expression recognition and face verification among unsupervised facial representation learning models.
翻译:面部表征的无监督学习因其无需大规模标注数据集即可实现面部理解的能力而日益受到关注。然而,由于面部身份、表情以及姿态和光照等外部因素的耦合,该问题仍未得到有效解决。现有方法主要聚焦于二维因子和像素级一致性,导致解耦不完整且在下游任务中表现欠佳。本文提出LatentFace——一种新颖的无监督面部表情与身份表征解耦框架。我们认为解耦问题应在潜空间中进行,并提出基于三维感知潜扩散模型的解决方案。首先,我们引入三维感知自编码器将人脸图像编码为三维潜嵌入。其次,我们提出新型表征扩散模型(RDM)将三维潜表征解耦为面部身份与表情。最终,该方法在无监督面部表征学习模型中,于面部表情识别和人脸验证任务上均取得了最先进性能。