Creating realistic or stylized facial and lip sync animation is a tedious task. It requires lot of time and skills to sync the lips with audio and convey the right emotion to the character's face. To allow animators to spend more time on the artistic and creative part of the animation, we present Audio2Rig: a new deep learning based tool leveraging previously animated sequences of a show, to generate facial and lip sync rig animation from an audio file. Based in Maya, it learns from any production rig without any adjustment and generates high quality and stylized animations which mimic the style of the show. Audio2Rig fits in the animator workflow: since it generates keys on the rig controllers, the animation can be easily retaken. The method is based on 3 neural network modules which can learn an arbitrary number of controllers. Hence, different configurations can be created for specific parts of the face (such as the tongue, lips or eyes). With Audio2Rig, animators can also pick different emotions and adjust their intensities to experiment or customize the output, and have high level controls on the keyframes setting. Our method shows excellent results, generating fine animation details while respecting the show style. Finally, as the training relies on the studio data and is done internally, it ensures data privacy and prevents from copyright infringement.
翻译:创建逼真或风格化的面部及口型同步动画是一项繁琐的任务。它需要大量时间和技能来使口型与音频同步,并将恰当的情感传达至角色面部。为了让动画师能将更多时间投入到动画的艺术与创意部分,我们提出了Audio2Rig:一种基于深度学习的新工具,它利用节目先前已制作的动画序列,从音频文件生成面部及口型同步的绑定动画。该工具基于Maya平台,无需任何调整即可从任意生产绑定中学习,并生成高质量、风格化的动画,以模仿节目的风格。Audio2Rig适配动画师的工作流程:由于它在绑定控制器上生成关键帧,动画可以轻松地被修改和接手。该方法基于3个神经网络模块,能够学习任意数量的控制器。因此,可以为面部的特定部位(如舌头、嘴唇或眼睛)创建不同的配置。借助Audio2Rig,动画师还可以选择不同的情绪并调整其强度以进行实验或定制输出,并对关键帧设置进行高级控制。我们的方法展现出优异的效果,在尊重节目风格的同时生成了精细的动画细节。最后,由于训练依赖于工作室内部数据并在内部完成,它确保了数据隐私并防止了版权侵犯。