Human face synthesis and manipulation are increasingly important in entertainment and AI, with a growing demand for highly realistic, identity-preserving images even when only unpaired, unaligned datasets are available. We study unpaired face manipulation via adversarial learning, moving from autoencoder baselines to a robust, guided CycleGAN framework. While autoencoders capture coarse identity, they often miss fine details. Our approach integrates spectral normalization for stable training, identity- and perceptual-guided losses to preserve subject identity and high-level structure, and landmark-weighted cycle constraints to maintain facial geometry across pose and illumination changes. Experiments show that our adversarial trained CycleGAN improves realism (FID), perceptual quality (LPIPS), and identity preservation (ID-Sim) over autoencoders, with competitive cycle-reconstruction SSIM and practical inference times, which achieved high quality without paired datasets and approaching pix2pix on curated paired subsets. These results demonstrate that guided, spectrally normalized CycleGANs provide a practical path from autoencoders to robust unpaired face manipulation.
翻译:人脸合成与操控在娱乐和人工智能领域日益重要,对高度逼真且保持身份特征的图像需求不断增长,即便在仅有非配对、未对齐数据集可用的情况下亦然。我们通过对抗学习研究非配对人脸操控,从自编码器基线方法转向鲁棒的、带引导的CycleGAN框架。虽然自编码器能够捕捉粗略的身份特征,但常常丢失细节信息。我们的方法集成了谱归一化以实现稳定训练,采用身份引导和感知引导损失以保持主体身份及高层结构,并引入基于面部关键点的加权循环约束以在姿态和光照变化下维持面部几何特征。实验表明,我们通过对抗训练的CycleGAN在真实感(FID)、感知质量(LPIPS)和身份保持(ID-Sim)方面均优于自编码器,同时保持了具有竞争力的循环重建结构相似性(SSIM)和实用的推理时间,在无需配对数据集的情况下实现了高质量生成,并在精选的配对子集上接近pix2pix的性能。这些结果表明,经过引导和谱归一化的CycleGAN为从自编码器迈向鲁棒的非配对人脸操控提供了一条实用路径。