CG crowds have become increasingly popular this last decade in the VFX and animation industry: formerly reserved to only a few high end studios and blockbusters, they are now widely used in TV shows or commercials. Yet, there is still one major limitation: in order to be ingested properly in crowd software, studio rigs have to comply with specific prerequisites, especially in terms of deformations. Usually only skinning, blend shapes and geometry caches are supported preventing close-up shots with facial performances on crowd characters. We envisioned two approaches to tackle this: either reverse engineer the hundreds of deformer nodes available in the major DCCs/plugins and incorporate them in our crowd package, or surf the machine learning wave to compress the deformations of a rig using a neural network architecture. Considering we could not commit 5+ man/years of development into this problem, and that we were excited to dip our toes in the machine learning pool, we went for the latter. From our first tests to a minimum viable product, we went through hopes and disappointments: we hit multiple pitfalls, took false shortcuts and dead ends before reaching our destination. With this paper, we hope to provide a valuable feedback by sharing the lessons we learnt from this experience.
翻译:过去十年间,CG人群在视觉特效和动画产业中的应用日益广泛:曾经仅局限于少数高端工作室和大型制作,如今已普遍运用于电视剧集或商业广告中。然而仍存在一个主要限制:为了能被人群软件正确导入,工作室的绑定系统必须满足特定前提条件,尤其在变形处理方面。通常仅支持蒙皮、混合形状和几何缓存,这导致无法对群集角色进行带有面部表演的特写镜头拍摄。我们设想了两种解决方案:要么对主流数字内容创作软件/插件中数百种变形器节点进行逆向工程并集成至我们的群集软件包,要么借助机器学习浪潮通过神经网络架构压缩绑定系统的变形数据。考虑到我们无法为此投入超过5人/年的开发资源,且对涉足机器学习领域充满热情,我们选择了后者。从初步测试到最小可行产品,我们经历了希望与失望:在抵达目标前,我们遭遇了多重困境,走过错误的捷径与死胡同。本文旨在通过分享此次实践获得的经验教训,为相关领域提供有价值的参考。