Recent Trends in 3D Reconstruction of General Non-Rigid Scenes

Raza Yunus,Jan Eric Lenssen,Michael Niemeyer,Yiyi Liao,Christian Rupprecht,Christian Theobalt,Gerard Pons-Moll,Jia-Bin Huang,Vladislav Golyanik,Eddy Ilg

from arxiv, 42 pages, 18 figures, 5 tables; State-of-the-Art Report at EUROGRAPHICS 2024. Project page: https://razayunus.github.io/non-rigid-star

Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Further, such models are fundamental for intelligent computing systems that need to interpret real-world scenes and actions to act and interact safely with the human world. Notably, the world surrounding us is dynamic, and reconstructing models of dynamic, non-rigidly moving scenes is a severely underconstrained and challenging problem. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs such as data from RGB and RGB-D sensors, among others, conveying an understanding of different approaches, their potential applications, and promising further research directions. The report covers 3D reconstruction of general non-rigid scenes and further addresses the techniques for scene decomposition, editing and controlling, and generalizable and generative modeling. More specifically, we first review the common and fundamental concepts necessary to understand and navigate the field and then discuss the state-of-the-art techniques by reviewing recent approaches that use traditional and machine-learning-based neural representations, including a discussion on the newly enabled applications. The STAR is concluded with a discussion of the remaining limitations and open challenges.

翻译：重建真实世界模型（包括真实场景的三维几何、外观及运动）是计算机图形学与计算机视觉的核心任务。该技术可实现电影行业及增强现实/虚拟现实应用所需的光写实级新视角合成，同时通过避免繁琐的人工设计流程，为计算机游戏及增强现实/虚拟现实中的内容创作提供便利。此外，这类模型对需要理解现实场景与动作以安全与人类世界交互的智能计算系统具有基础性意义。值得注意的是，我们周遭的世界是动态的，而重建动态、非刚性运动场景模型是一个高度欠约束的难题。本技术现状报告（STAR）为读者系统梳理了基于单目与多视图输入（如RGB及RGB-D传感器数据）的最新技术，阐释了不同方法、潜在应用及富有前景的未来研究方向。该报告涵盖一般非刚性场景的三维重建，并进一步探讨场景分解、编辑与控制、通用化与生成式建模等技术。具体而言，我们首先回顾理解与探索该领域所需的通用基础概念，随后通过综述传统方法与基于机器学习的神经表示最新方案（包括对新兴应用的讨论）来论述前沿技术。报告最后总结了现有局限性与待解决的开放挑战。