FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances

Visual effects commonly requires both the creation of realistic synthetic humans as well as retargeting actors' performances to humanoid characters such as aliens and monsters. Achieving the expressive performances demanded in entertainment requires manipulating complex models with hundreds of parameters. Full creative control requires the freedom to make edits at any stage of the production, which prohibits the use of a fully automatic ``black box'' solution with uninterpretable parameters. On the other hand, producing realistic animation with these sophisticated models is difficult and laborious. This paper describes FDLS (Facial Deep Learning Solver), which is Weta Digital's solution to these challenges. FDLS adopts a coarse-to-fine and human-in-the-loop strategy, allowing a solved performance to be verified and edited at several stages in the solving process. To train FDLS, we first transform the raw motion-captured data into robust graph features. Secondly, based on the observation that the artists typically finalize the jaw pass animation before proceeding to finer detail, we solve for the jaw motion first and predict fine expressions with region-based networks conditioned on the jaw position. Finally, artists can optionally invoke a non-linear finetuning process on top of the FDLS solution to follow the motion-captured virtual markers as closely as possible. FDLS supports editing if needed to improve the results of the deep learning solution and it can handle small daily changes in the actor's face shape. FDLS permits reliable and production-quality performance solving with minimal training and little or no manual effort in many cases, while also allowing the solve to be guided and edited in unusual and difficult cases. The system has been under development for several years and has been used in major movies.

翻译：视觉特效通常既需要创建逼真的合成人类，也需要将演员的表演重定向到类人角色（如外星人和怪物）上。实现娱乐行业所要求的表现力表演，需要操作包含数百个参数的复杂模型。完全的创作控制要求允许在制作的任何阶段进行编辑，这排除了使用参数不可解释的全自动"黑箱"解决方案的可能性。另一方面，使用这些复杂模型制作逼真的动画既困难又费力。本文描述了FDLS（面部深度学习求解器），这是维塔数码应对这些挑战的解决方案。FDLS采用由粗到精且引入人工参与的循环策略，允许在求解过程的多个阶段验证和编辑求解结果。为训练FDLS，我们首先将原始动作捕捉数据转换为鲁棒的图特征。其次，基于艺术家通常先完成下颌通道动画再处理更精细细节的观察，我们首先求解下颌运动，并基于下颌位置利用区域网络预测精细表情。最后，艺术家可选择性调用基于FDLS解的非线性微调过程，以尽可能紧密地跟随动作捕捉虚拟标记点。FDLS支持在需要时进行编辑以改进深度学习结果，并能处理演员面部形状的日常微小变化。在许多情况下，FDLS能以最少训练且无需或几乎无需人工干预即可实现可靠且达到生产质量的性能求解，同时在异常和困难情况下也允许引导和编辑求解过程。该系统已历经数年开发，并已应用于多部主流电影。