Unsupervised learning of facial representations has gained increasing attention for face understanding ability without heavily relying on large-scale annotated datasets. However, it remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. Prior methods primarily focus on 2D factors and pixel-level consistency, leading to incomplete disentangling and suboptimal performance in downstream tasks. In this paper, we propose LatentFace, a novel unsupervised disentangling framework for facial expression and identity representation. We suggest the disentangling problem should be performed in latent space and propose the solution using a 3D-aware latent diffusion model. First, we introduce a 3D-aware autoencoder to encode face images into 3D latent embeddings. Second, we propose a novel representation diffusion model (RDM) to disentangle 3D latent into facial identity and expression. Consequently, our method achieves state-of-the-art performance in facial expression recognition and face verification among unsupervised facial representation learning models. Codes are available at \url{https://github.com/ryanhe312/LatentFace}.
翻译:无监督学习人脸表征因其无需依赖大规模标注数据集即可理解人脸的能力而日益受到关注。然而,由于人脸身份、表情以及姿态和光照等外部因素的耦合,该问题仍未得到根本解决。现有方法主要聚焦于二维因子和像素级一致性,导致解耦不完整,在下游任务中表现欠佳。本文提出LatentFace——一种用于人脸表情与身份表征的新型无监督解耦框架。我们主张解耦问题应在潜空间中进行,并提出采用三维感知潜空间扩散模型的解决方案。首先,引入三维感知自编码器将人脸图像编码为三维潜空间嵌入;其次,提出新型表征扩散模型(RDM)将三维潜空间表征解耦为人脸身份与表情。最终,该方法在无监督人脸表征学习模型中,于面部表情识别和人脸验证任务上均达到最优性能。代码已开源至\url{https://github.com/ryanhe312/LatentFace}。