Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric annotations. To address this limitation, we introduce Syn4D, a multiview synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations. A key feature of Syn4D is the ability to unproject any pixel into 3D to any time and to any camera. We conduct extensive evaluations across multiple downstream tasks to demonstrate the utility and effectiveness of the proposed dataset, including 4D scene reconstruction, 3D point tracking, geometry-aware camera retargeting, and human pose estimation. The experimental results highlight Syn4D's potential to facilitate research in dynamic scene understanding and spatiotemporal modeling.
翻译:从单目视频进行动态场景的密集三维重建与跟踪仍是计算机视觉领域一个重要的开放挑战。该领域的发展受限于高质量数据集的匮乏,这类数据集需具备密集、完整且精确的几何标注。为解决这一局限,我们提出了Syn4D——一个包含动态场景的多视角合成数据集,其提供了真实相机运动、深度图、密集跟踪以及参数化人体姿态标注。Syn4D的一个关键特性是能够将任意像素反投影到任意时间点的三维空间及任意相机视角。我们针对多个下游任务开展了广泛评估,以论证所提数据集的实用性与有效性,这些任务包括4D场景重建、三维点跟踪、几何感知相机重定位以及人体姿态估计。实验结果凸显了Syn4D在推动动态场景理解与时空建模研究方面的潜力。