M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation

In text-driven 3D scene generation, object layout serves as a crucial intermediate representation that bridges high-level language instructions with detailed geometric output. It not only provides a structural blueprint for ensuring physical plausibility but also supports semantic controllability and interactive editing. However, the learning capabilities of current 3D indoor layout generation models are constrained by the limited scale, diversity, and annotation quality of existing datasets. To address this, we introduce M3DLayout, a large-scale, multi-source dataset for 3D indoor layout generation. M3DLayout comprises 21,367 layouts and over 433k object instances, integrating three distinct sources: real-world scans, professional CAD designs, and procedurally generated scenes. Each layout is paired with detailed structured text describing global scene summaries, relational placements of large furniture, and fine-grained arrangements of smaller items. This diverse and richly annotated resource enables models to learn complex spatial and semantic patterns across a wide variety of indoor environments. To assess the potential of M3DLayout, we establish a benchmark using both a text-conditioned diffusion model and a text-conditioned autoregressive model. Experimental results demonstrate that our dataset provides a solid foundation for training layout generation models. Its multi-source composition enhances diversity, notably through the Inf3DLayout subset which provides rich small-object information, enabling the generation of more complex and detailed scenes. We hope that M3DLayout can serve as a valuable resource for advancing research in text-driven 3D scene synthesis. All dataset and code will be made public upon acceptance.

翻译：在文本驱动的三维场景生成中，物体布局作为一种关键的中间表示，连接着高层级的语言指令与详细的几何输出。它不仅为确保物理合理性提供了结构蓝图，还支持语义可控性与交互式编辑。然而，当前三维室内布局生成模型的学习能力受限于现有数据集的规模、多样性和标注质量。为解决此问题，我们提出了M3DLayout，一个用于三维室内布局生成的大规模多源数据集。M3DLayout包含21,367个布局和超过433,000个物体实例，整合了三个不同的来源：真实世界扫描、专业CAD设计以及程序化生成的场景。每个布局都配有详细的结构化文本描述，涵盖全局场景摘要、大型家具的关系性摆放以及小型物品的细粒度布置。这种多样且标注丰富的资源使得模型能够学习各种室内环境中复杂的空间与语义模式。为评估M3DLayout的潜力，我们建立了一个基准测试，同时使用了文本条件扩散模型和文本条件自回归模型。实验结果表明，我们的数据集为训练布局生成模型提供了坚实的基础。其多源构成增强了多样性，特别是通过Inf3DLayout子集提供了丰富的小物体信息，从而能够生成更复杂和细致的场景。我们希望M3DLayout能够成为推动文本驱动的三维场景合成研究的有价值资源。所有数据集和代码将在论文被接受后公开。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

综述｜学习式3D表征最新进展与趋势

专知会员服务

11+阅读 · 6月5日

OpenEarthAgent：一种面向工具增强型地理空间智能体的统一框架

专知会员服务

16+阅读 · 2月20日

【博士论文】室内场景三维重建的基于学习的方法

专知会员服务

12+阅读 · 2月16日

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

专知会员服务

44+阅读 · 2025年12月16日