We present a neural network-based simulation super-resolution framework that can efficiently and realistically enhance a facial performance produced by a low-cost, realtime physics-based simulation to a level of detail that closely approximates that of a reference-quality off-line simulator with much higher resolution (26x element count in our examples) and accurate physical modeling. Our approach is rooted in our ability to construct - via simulation - a training set of paired frames, from the low- and high-resolution simulators respectively, that are in semantic correspondence with each other. We use face animation as an exemplar of such a simulation domain, where creating this semantic congruence is achieved by simply dialing in the same muscle actuation controls and skeletal pose in the two simulators. Our proposed neural network super-resolution framework generalizes from this training set to unseen expressions, compensates for modeling discrepancies between the two simulations due to limited resolution or cost-cutting approximations in the real-time variant, and does not require any semantic descriptors or parameters to be provided as input, other than the result of the real-time simulation. We evaluate the efficacy of our pipeline on a variety of expressive performances and provide comparisons and ablation experiments for plausible variations and alternatives to our proposed scheme.
翻译:我们提出了一种基于神经网络的模拟超分辨率框架,能够高效且逼真地将低成本实时物理模拟产生的面部表现增强至接近参考质量离线模拟器的细节水平,后者具有更高的分辨率(在我们的示例中,单元数增加26倍)和精确的物理建模。我们的方法核心在于能够通过模拟构建一个由低分辨率和高分辨率模拟器生成的帧对训练集,这些帧在语义上彼此对应。我们以面部动画作为此类模拟领域的范例,在该领域中,通过简单地在两个模拟器中输入相同的肌肉激活控制和骨骼姿态即可实现语义一致性。我们提出的神经网络超分辨率框架能从该训练集泛化到未见表情,补偿因实时变体中分辨率限制或成本削减近似导致的两种模拟之间的建模差异,并且除了实时模拟结果外,无需提供任何语义描述符或参数作为输入。我们通过多种表情表演评估了该流程的有效性,并对我们提出的方案进行了比较和消融实验,以验证其合理变体及替代方案的性能。