This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks -- 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis. We identify the challenges involved in modeling scenes for these tasks and the kind of machinery that needs to be developed to adapt to the data representation, and the task setting in general. For each of these tasks, we provide a comprehensive summary of the state-of-the-art works across different axes such as the choice of data representation, backbone, evaluation metric, input, output, etc., providing an organized review of the literature. Towards the end, we discuss some interesting research directions that have the potential to make a direct impact on the way users interact and engage with these virtual scene models, making them an integral part of the metaverse.
翻译:本文综述了基于深度学习的建模技术在四种不同三维室内场景分析任务以及三维室内场景合成方面的进展。我们描述了室内场景的不同表示方法、可用于上述领域研究的各类室内场景数据集,并讨论了基于这些表示方法、采用机器学习模型进行场景建模的代表性工作。具体而言,我们聚焦于三维室内场景的分析与合成。在分析方面,我们关注四项基础场景理解任务——三维目标检测、三维场景分割、三维场景重建和三维场景相似度计算。在合成方面,我们主要讨论神经场景合成工作,同时也强调允许以人为中心、渐进式场景合成的模型驱动方法。我们指出了在这些任务中进行场景建模所面临的挑战,以及为适应数据表示和通用任务设置而需要发展的机制。针对每一项任务,我们从数据表示选择、骨干网络、评估指标、输入输出等多个维度对当前最佳工作进行了全面总结,提供了文献的系统性综述。最后,我们讨论了一些有趣的研究方向,这些方向有潜力直接影响用户与这些虚拟场景模型的交互方式,使其成为元宇宙不可或缺的组成部分。