Facial expression manipulation aims to change human facial expressions without affecting face recognition. In order to transform the facial expressions to target expressions, previous methods relied on expression labels to guide the manipulation process. However, these methods failed to preserve the details of facial features, which causes the weakening or the loss of identity information in the output image. In our work, we propose WEM-GAN, in short for wavelet-based expression manipulation GAN, which puts more efforts on preserving the details of the original image in the editing process. Firstly, we take advantage of the wavelet transform technique and combine it with our generator with a U-net autoencoder backbone, in order to improve the generator's ability to preserve more details of facial features. Secondly, we also implement the high-frequency component discriminator, and use high-frequency domain adversarial loss to further constrain the optimization of our model, providing the generated face image with more abundant details. Additionally, in order to narrow the gap between generated facial expressions and target expressions, we use residual connections between encoder and decoder, while also using relative action units (AUs) several times. Extensive qualitative and quantitative experiments have demonstrated that our model performs better in preserving identity features, editing capability, and image generation quality on the AffectNet dataset. It also shows superior performance in metrics such as Average Content Distance (ACD) and Expression Distance (ED).
翻译:面部表情操控旨在改变人类面部表情而不影响人脸识别。为了将面部表情转换为目标表情,先前的方法依赖表情标签来指导操控过程。然而,这些方法未能保留面部特征的细节,导致输出图像中身份信息的减弱或丢失。在我们的工作中,我们提出了WEM-GAN(基于小波的表情操控生成对抗网络),该模型在编辑过程中更注重保留原始图像的细节。首先,我们利用小波变换技术,并将其与基于U-net自编码器架构的生成器相结合,以提升生成器保留面部特征细节的能力。其次,我们还实现了高频分量判别器,并采用高频域对抗损失进一步约束模型的优化,使生成的人脸图像具有更丰富的细节。此外,为缩小生成表情与目标表情之间的差距,我们在编码器与解码器之间使用了残差连接,同时多次利用相对动作单元(AUs)。大量的定性与定量实验表明,我们的模型在AffectNet数据集上,在身份特征保持、编辑能力和图像生成质量方面表现更优,同时在平均内容距离(ACD)和表情距离(ED)等指标上也显示出优越性能。