Advances in 4D Representation: Geometry, Motion, and Interaction

We present a survey on 4D generation and reconstruction, a fast-evolving subfield of computer graphics whose developments have been propelled by recent advances in neural fields, geometric and motion deep learning, as well as 3D generative artificial intelligence (GenAI). While our survey is not the first of its kind, we build our coverage of the domain from a unique and distinctive perspective of 4D representations, to model 3D geometry evolving over time while exhibiting motion and interaction. Specifically, instead of offering an exhaustive enumeration of many works, we take a more selective approach by focusing on representative works to highlight both the desirable properties and ensuing challenges of each representation under different computation, application, and data scenarios. The main take-away message we aim to convey to the readers is on how to select and then customize the appropriate 4D representations for their tasks. Organizationally, we separate the 4D representations based on three key pillars: geometry, motion, and interaction. Our discourse will not only encompass the most popular representations of today, such as neural radiance fields (NeRFs) and 3D Gaussian Splatting (3DGS), but also bring attention to relatively under-explored representations in the 4D context, such as structured models and long-range motions. Throughout our survey, we will reprise the role of large language models (LLMs) and video foundational models (VFMs) in a variety of 4D applications, while steering our discussion towards their current limitations and how they can be addressed. We also provide a dedicated coverage on what 4D datasets are currently available, as well as what is lacking, in driving the subfield forward. Project page:https://mingrui-zhao.github.io/4DRep-GMI/

翻译：本文综述了4D生成与重建领域——计算机图形学中快速演化的子领域，其发展得益于神经场、几何与运动深度学习以及3D生成式人工智能（GenAI）的最新进展。虽非首篇领域综述，但本工作以独特的4D表示视角构建领域覆盖框架，旨在建模随时间演化的三维几何体及其运动与交互特性。具体而言，我们未采用穷举式文献罗列，而是选择性聚焦代表性工作，突出不同计算场景、应用场景与数据场景下各类表示的理想属性与伴随挑战。核心主旨在于指导读者如何为其任务选择并定制合适的4D表示。在组织结构上，我们基于三大核心支柱区分4D表示：几何、运动与交互。讨论内容不仅涵盖当前最流行的表示方法（如神经辐射场NeRF与三维高斯泼溅3DGS），还将关注4D背景下相对待开发的表示形式（如结构化模型与长程运动）。全篇将反复强调大语言模型（LLM）与视频基础模型（VFM）在多元4D应用中的作用，同时引导讨论走向其当前局限性与解决方案。此外，我们专辟章节探讨推动子领域发展的现有4D数据集及其不足。项目页面：https://mingrui-zhao.github.io/4DRep-GMI/