This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks -- 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis. We identify the challenges involved in modeling scenes for these tasks and the kind of machinery that needs to be developed to adapt to the data representation, and the task setting in general. For each of these tasks, we provide a comprehensive summary of the state-of-the-art works across different axes such as the choice of data representation, backbone, evaluation metric, input, output, etc., providing an organized review of the literature. Towards the end, we discuss some interesting research directions that have the potential to make a direct impact on the way users interact and engage with these virtual scene models, making them an integral part of the metaverse.
翻译:本综述调研了基于深度学习的建模技术在处理四种不同三维室内场景分析任务及三维室内场景合成方面的进展。我们描述了室内场景的不同表征方式、可供上述领域研究的各类室内场景数据集,并讨论了基于这些表征运用机器学习模型进行场景建模的代表性工作。具体而言,我们聚焦于三维室内场景的分析与合成。在分析方面,我们关注四项基本场景理解任务——三维目标检测、三维场景分割、三维场景重建及三维场景相似性。在合成方面,我们主要讨论神经场景合成工作,同时也强调支持以人为本、渐进式场景合成的模型驱动方法。我们梳理了这些任务中场景建模面临的挑战,以及为适应数据表征和任务设定所需发展的技术体系。针对每项任务,我们从数据表征选择、主干网络、评估指标、输入输出等维度对前沿工作进行了系统总结,形成了有组织的文献综述。最后,我们探讨了一些有望直接影响用户与虚拟场景模型交互方式的研究方向,这些方向将使场景模型成为元宇宙的核心组成部分。