Face video restoration from degraded observations is challenging, as it requires simultaneously recovering visual fidelity, temporal consistency, and subject identity. Existing approaches are often either reference-free, which can lead to identity loss when person-specific facial details are lost, or subject-specific, which limits generalization to unseen identities. We propose a subject-agnostic, reference-guided framework for identity-preserving face video restoration. Our method introduces bimodal perceptual-descriptive identity conditioning into a pretrained flow-based text-to-video generator and employs a two-stage training strategy to strengthen identity guidance during restoration. Experiments show that our approach improves restoration fidelity, temporal consistency, and identity preservation, achieving superior performance under challenging video degradations, including downsampling, blur, noise, and compression artifacts. The code is available under: https://github.com/batuhanntosun/RG-FVR.
翻译:从退化观测数据中复原人脸视频具有挑战性,因为它需要同时恢复视觉保真度、时间一致性和主体身份。现有方法通常要么是无参考的,当个体特定面部细节丢失时可能导致身份丢失,要么是特定于主体的,这限制了对未见身份的泛化能力。我们提出了一种主体无关、参考引导的框架,用于身份保持的人脸视频复原。该方法将双模态感知-描述性身份条件引入预训练的基于流的文本到视频生成器中,并采用两阶段训练策略以在复原过程中增强身份引导。实验表明,我们的方法提升了复原保真度、时间一致性和身份保持能力,在包括下采样、模糊、噪声和压缩伪影等具有挑战性的视频退化条件下均取得了优越性能。代码见:https://github.com/batuhanntosun/RG-FVR。