TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene

Despite advancements in Neural Implicit models for 3D surface reconstruction, handling dynamic environments with interactions between arbitrary rigid, non-rigid, or deformable entities remains challenging. The generic reconstruction methods adaptable to such dynamic scenes often require additional inputs like depth or optical flow or rely on pre-trained image features for reasonable outcomes. These methods typically use latent codes to capture frame-by-frame deformations. Another set of dynamic scene reconstruction methods, are entity-specific, mostly focusing on humans, and relies on template models. In contrast, some template-free methods bypass these requirements and adopt traditional LBS (Linear Blend Skinning) weights for a detailed representation of deformable object motions, although they involve complex optimizations leading to lengthy training times. To this end, as a remedy, this paper introduces TFS-NeRF, a template-free 3D semantic NeRF for dynamic scenes captured from sparse or single-view RGB videos, featuring interactions among two entities and more time-efficient than other LBS-based approaches. Our framework uses an Invertible Neural Network (INN) for LBS prediction, simplifying the training process. By disentangling the motions of interacting entities and optimizing per-entity skinning weights, our method efficiently generates accurate, semantically separable geometries. Extensive experiments demonstrate that our approach produces high-quality reconstructions of both deformable and non-deformable objects in complex interactions, with improved training efficiency compared to existing methods.

翻译：尽管神经隐式模型在三维表面重建方面取得了进展，但处理包含任意刚性、非刚性或可变形实体间相互作用的动态环境仍然具有挑战性。适用于此类动态场景的通用重建方法通常需要额外的输入（如深度或光流），或依赖预训练的图像特征以获得合理结果。这些方法通常使用潜在编码来逐帧捕捉形变。另一类动态场景重建方法是实体特定的，主要关注人体，并依赖于模板模型。相比之下，一些无模板方法绕过了这些要求，采用传统的线性混合蒙皮权重来详细表示可变形物体的运动，尽管它们涉及复杂的优化过程，导致训练时间较长。为此，作为一种改进方案，本文提出了TFS-NeRF，一种用于从稀疏或单视角RGB视频中捕获的动态场景的无模板三维语义NeRF，能够处理两个及以上实体间的相互作用，并且比其他基于线性混合蒙皮的方法具有更高的时间效率。我们的框架使用可逆神经网络来预测线性混合蒙皮权重，从而简化了训练过程。通过解耦相互作用实体的运动并优化每个实体的蒙皮权重，我们的方法能够高效生成精确且语义可分离的几何结构。大量实验表明，我们的方法能够对复杂相互作用中的可变形和不可变形物体进行高质量重建，与现有方法相比，训练效率也得到了提升。