Unsupervised learning of facial representations has gained increasing attention for face understanding ability without heavily relying on large-scale annotated datasets. However, it remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. Prior methods primarily focus on 2D factors and pixel-level consistency, leading to incomplete disentangling and suboptimal performance in downstream tasks. In this paper, we propose LatentFace, a novel unsupervised disentangling framework for facial expression and identity representation. We suggest the disentangling problem should be performed in latent space and propose the solution using a 3D-aware latent diffusion model. First, we introduce a 3D-aware autoencoder to encode face images into 3D latent embeddings. Second, we propose a novel representation diffusion model (RDM) to disentangle 3D latent into facial identity and expression. Consequently, our method achieves state-of-the-art performance in facial expression recognition and face verification among unsupervised facial representation learning models. Codes are available at \url{https://github.com/ryanhe312/LatentFace}.
翻译:面部表示的无监督学习因无需严重依赖大规模标注数据集即可理解人脸的能力而受到日益关注。然而,由于面部身份、表情以及姿态和光照等外部因素的耦合,该问题仍未得到解决。现有方法主要关注2D因素和像素级一致性,导致解耦不完整且在下游任务中性能欠佳。本文提出LatentFace——一种用于面部表情和身份表示的新型无监督解耦框架。我们认为解耦问题应在潜在空间中进行,并提出利用3D感知潜在扩散模型的解决方案。首先,我们引入3D感知自编码器将人脸图像编码为3D潜在嵌入。其次,提出新型表示扩散模型(RDM)将3D潜在特征解耦为面部身份和表情。最终,我们的方法在无监督面部表示学习模型中,于面部表情识别和人脸验证任务上达到了最优性能。代码已开源至 \url{https://github.com/ryanhe312/LatentFace}。