从自编码器到CycleGAN：通过对抗学习实现鲁棒的非配对人脸操控 (From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning)

Human face synthesis and manipulation are increasingly important in entertainment and AI, with a growing demand for highly realistic, identity-preserving images even when only unpaired, unaligned datasets are available. We study unpaired face manipulation via adversarial learning, moving from autoencoder baselines to a robust, guided CycleGAN framework. While autoencoders capture coarse identity, they often miss fine details. Our approach integrates spectral normalization for stable training, identity- and perceptual-guided losses to preserve subject identity and high-level structure, and landmark-weighted cycle constraints to maintain facial geometry across pose and illumination changes. Experiments show that our adversarial trained CycleGAN improves realism (FID), perceptual quality (LPIPS), and identity preservation (ID-Sim) over autoencoders, with competitive cycle-reconstruction SSIM and practical inference times, which achieved high quality without paired datasets and approaching pix2pix on curated paired subsets. These results demonstrate that guided, spectrally normalized CycleGANs provide a practical path from autoencoders to robust unpaired face manipulation.

翻译：人脸合成与操控在娱乐和人工智能领域日益重要，对高度逼真且保持身份特征的图像需求不断增长，即便在仅有非配对、未对齐数据集可用的情况下亦然。我们通过对抗学习研究非配对人脸操控，从自编码器基线方法转向鲁棒的、带引导的CycleGAN框架。虽然自编码器能够捕捉粗略的身份特征，但常常丢失细节信息。我们的方法集成了谱归一化以实现稳定训练，采用身份引导和感知引导损失以保持主体身份及高层结构，并引入基于面部关键点的加权循环约束以在姿态和光照变化下维持面部几何特征。实验表明，我们通过对抗训练的CycleGAN在真实感（FID）、感知质量（LPIPS）和身份保持（ID-Sim）方面均优于自编码器，同时保持了具有竞争力的循环重建结构相似性（SSIM）和实用的推理时间，在无需配对数据集的情况下实现了高质量生成，并在精选的配对子集上接近pix2pix的性能。这些结果表明，经过引导和谱归一化的CycleGAN为从自编码器迈向鲁棒的非配对人脸操控提供了一条实用路径。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

面向图像分类的对抗鲁棒性评估综述

专知会员服务

59+阅读 · 2022年10月15日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

【Science Advances】MIT最新论文《特化类脑功能在深度神经网络中自发应用》，人脸识别的优化解决方案

专知会员服务

16+阅读 · 2022年4月10日

【开放书】《数字人脸操作与检测手册》，481pdf，Handbook of Digital Face Manipulationand Detection

专知会员服务

22+阅读 · 2022年3月24日