This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks -- 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis. We identify the challenges involved in modeling scenes for these tasks and the kind of machinery that needs to be developed to adapt to the data representation, and the task setting in general. For each of these tasks, we provide a comprehensive summary of the state-of-the-art works across different axes such as the choice of data representation, backbone, evaluation metric, input, output, etc., providing an organized review of the literature. Towards the end, we discuss some interesting research directions that have the potential to make a direct impact on the way users interact and engage with these virtual scene models, making them an integral part of the metaverse.
翻译:本综述报告调查了基于深度学习的建模技术在处理四种不同三维室内场景分析任务以及三维室内场景合成方面的进展。我们描述了室内场景的不同表示方式、可用于上述领域研究的各种室内场景数据集,并讨论了基于这些表示方式使用机器学习模型进行场景建模的代表性工作。具体而言,我们聚焦于三维室内场景的分析与合成。在分析方面,我们关注四项基本的场景理解任务——三维目标检测、三维场景分割、三维场景重建和三维场景相似度。而在合成方面,我们主要讨论神经场景合成工作,同时强调允许以人为中心、渐进式场景合成的模型驱动方法。我们指出了这些任务中场景建模所面临的挑战,以及需要开发何种机制来适应数据表示和任务设置。针对每项任务,我们提供了在不同维度(如数据表示的选择、骨干网络、评估指标、输入、输出等)上最先进工作的全面总结,从而呈现一个有条理的文献综述。最后,我们讨论了一些有趣的研究方向,这些方向有可能直接影响用户与这些虚拟场景模型的交互方式,使其成为元宇宙的组成部分。