Portrait stylization, which translates a real human face image into an artistically stylized image, has attracted considerable interest and many prior works have shown impressive quality in recent years. However, despite their remarkable performances in the image-level translation tasks, prior methods show unsatisfactory results when they are applied to the video domain. To address the issue, we propose a novel two-stage video translation framework with an objective function which enforces a model to generate a temporally coherent stylized video while preserving context in the source video. Furthermore, our model runs in real-time with the latency of 0.011 seconds per frame and requires only 5.6M parameters, and thus is widely applicable to practical real-world applications.
翻译:肖像风格化,即将真实人脸图像转换为艺术风格化图像,近年来吸引了广泛兴趣,且诸多先前工作已展现出令人瞩目的质量。然而,尽管这些方法在图像级翻译任务中表现卓越,但应用于视频域时却呈现出不理想的结果。为解决这一问题,我们提出了一种新颖的两阶段视频翻译框架,并配以目标函数,该函数强制模型生成时间连贯的风格化视频,同时保留源视频中的上下文信息。此外,我们的模型可实现实时运行,每帧延迟仅为0.011秒,且仅需5.6M参数,从而广泛适用于实际应用场景。