Recent Trends in 3D Reconstruction of General Non-Rigid Scenes

Raza Yunus,Jan Eric Lenssen,Michael Niemeyer,Yiyi Liao,Christian Rupprecht,Christian Theobalt,Gerard Pons-Moll,Jia-Bin Huang,Vladislav Golyanik,Eddy Ilg

from arxiv, 42 pages, 18 figures, 5 tables; State-of-the-Art Report at EUROGRAPHICS 2024

Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Further, such models are fundamental for intelligent computing systems that need to interpret real-world scenes and actions to act and interact safely with the human world. Notably, the world surrounding us is dynamic, and reconstructing models of dynamic, non-rigidly moving scenes is a severely underconstrained and challenging problem. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs such as data from RGB and RGB-D sensors, among others, conveying an understanding of different approaches, their potential applications, and promising further research directions. The report covers 3D reconstruction of general non-rigid scenes and further addresses the techniques for scene decomposition, editing and controlling, and generalizable and generative modeling. More specifically, we first review the common and fundamental concepts necessary to understand and navigate the field and then discuss the state-of-the-art techniques by reviewing recent approaches that use traditional and machine-learning-based neural representations, including a discussion on the newly enabled applications. The STAR is concluded with a discussion of the remaining limitations and open challenges.

翻译：重建真实世界的模型，包括真实场景的三维几何、外观和运动，对于计算机图形学和计算机视觉至关重要。它能够合成逼真的新视角图像，对电影行业和增强现实/虚拟现实应用非常有用。通过避免繁琐的手动设计过程，它还能促进计算机游戏和增强现实/虚拟现实中所需的内容创作。此外，这类模型对于需要理解真实世界场景和行动、以安全地与人类世界互动和交互的智能计算系统来说也是基础。值得注意的是，我们周围的世界是动态的，重建动态的、非刚性运动的场景是一个严重欠约束且极具挑战性的问题。本最新技术报告（STAR）为读者提供了关于单目和多视图输入（如来自RGB和RGB-D传感器的数据等）的前沿技术的全面概述，传达了不同方法、其潜在应用以及有前景的进一步研究方向的理解。该报告涵盖了通用非刚性场景的三维重建，并进一步涉及场景分解、编辑和控制以及可泛化与生成建模的技术。更具体地说，我们首先回顾了理解和导航该领域所需的常见及基本概念，然后通过回顾近期使用传统和基于机器学习的神经表示的方法来讨论前沿技术，包括对由此启用的新应用的讨论。最后，该STAR以对剩余局限性和开放性挑战的讨论作为结尾。