InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

The advancement of Embodied AI heavily relies on large-scale, simulatable 3D scene datasets characterized by scene diversity and realistic layouts. However, existing datasets typically suffer from limitations in data scale or diversity, sanitized layouts lacking small items, and severe object collisions. To address these shortcomings, we introduce \textbf{InternScenes}, a novel large-scale simulatable indoor scene dataset comprising approximately 40,000 diverse scenes by integrating three disparate scene sources, real-world scans, procedurally generated scenes, and designer-created scenes, including 1.96M 3D objects and covering 15 common scene types and 288 object classes. We particularly preserve massive small items in the scenes, resulting in realistic and complex layouts with an average of 41.5 objects per region. Our comprehensive data processing pipeline ensures simulatability by creating real-to-sim replicas for real-world scans, enhances interactivity by incorporating interactive objects into these scenes, and resolves object collisions by physical simulations. We demonstrate the value of InternScenes with two benchmark applications: scene layout generation and point-goal navigation. Both show the new challenges posed by the complex and realistic layouts. More importantly, InternScenes paves the way for scaling up the model training for both tasks, making the generation and navigation in such complex scenes possible. We commit to open-sourcing the data, models, and benchmarks to benefit the whole community.

翻译：具身人工智能的发展高度依赖于大规模、可仿真的3D场景数据集，这些数据集需具备场景多样性和现实布局。然而，现有数据集通常存在数据规模或多样性受限、布局缺乏小物品且过于规整、以及严重的物体碰撞等问题。为解决这些不足，我们提出了**InternScenes**——一个新颖的大规模可仿真室内场景数据集，通过整合三种不同来源的场景（真实世界扫描、程序化生成场景和设计师创建场景），包含约4万个多样化场景、196万个3D物体，覆盖15种常见场景类型和288个物体类别。我们特别保留了场景中的大量小物品，从而形成平均每个区域41.5个物体的现实复杂布局。我们全面的数据处理流程通过为真实世界扫描创建真实到仿真副本以确保可仿真性，通过将交互式物体融入场景以增强交互性，并通过物理模拟解决物体碰撞问题。我们通过场景布局生成和点目标导航两个基准应用展示了InternScenes的价值，两者均表明了复杂现实布局带来的新挑战。更重要的是，InternScenes为扩展这两类任务的模型训练规模铺平道路，使在如此复杂场景中的生成与导航成为可能。我们承诺开源数据、模型和基准测试，以惠及整个研究社区。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【博士论文】面向真实世界场景的基于 Slot 的对象中心表征学习

专知会员服务

13+阅读 · 2025年10月4日

数据驱动的具身学习探索

专知会员服务

11+阅读 · 2025年2月26日

深度多模态数据融合

专知会员服务

55+阅读 · 2024年11月9日

《多模态3D场景理解》最新综述

专知会员服务

192+阅读 · 2023年10月28日