3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $\phi$-SfT, a state-of-the-art physics-based SfT approach.
翻译:动态场景的三维重建是计算机图形学中长期存在的问题,且可获取信息越少,难度越大。模板赋形方法旨在利用RGB图像或视频序列(通常仅依赖单目摄像头而无需深度信息,例如普通智能手机拍摄的影像)重建基于模板的几何结构。然而,现有重建方法要么缺乏物理真实性且噪声明显,要么优化速度缓慢。为解决此问题,我们提出一种用于布料的新型模板赋形重建算法,该算法采用预训练的神经代理模型,具有快速评估、稳定性强且通过正则化物理模拟生成平滑重建结果的特性。对模拟网格进行可微分渲染,能够实现重建结果与目标视频序列之间的逐像素对比,从而支持基于梯度的优化流程,不仅提取形状信息,还可获取物理参数(如布料的拉伸、剪切或弯曲刚度)。与基于物理的现有最优模板赋形方法$\phi$-SfT相比,本方法在保持精确、稳定且平滑的重建几何结构的同时,将运行时间降低400至500倍。