Generative models have shown substantial impact across multiple domains, their potential for scene synthesis remains underexplored in robotics. This gap is more evident in drone simulators, where simulation environments still rely heavily on manual efforts, which are time-consuming to create and difficult to scale. In this work, we introduce AeroScene, a hierarchical diffusion model for progressive 3D scene synthesis. Our approach leverages hierarchy-aware tokenization and multi-branch feature extraction to reason across both global layouts and local details, ensuring physical plausibility and semantic consistency. This makes AeroScene particularly suited for generating realistic scenes for aerial robotics tasks such as navigation, landing, and perching. We demonstrate its effectiveness through extensive experiments on our newly collected dataset and a public benchmark, showing that AeroScene significantly outperforms prior methods. Furthermore, we use AeroScene to generate a large-scale dataset of over 1,000 physics-ready, high fidelity 3D scenes that can be directly integrated into NVIDIA Isaac Sim. Finally, we illustrate the utility of these generated environments on downstream drone navigation tasks. Our code and dataset are publicly available at aioz-ai.github.io/AeroScene/
翻译:生成模型已在多个领域展现出显著影响,但其在机器人场景合成中的潜力仍未被充分探索。这一差距在无人机模拟器中尤为明显,当前仿真环境仍严重依赖人工构建,耗时且难以扩展。为此,我们提出AeroScene——一种用于渐进式三维场景合成的层次化扩散模型。该方法通过层级感知分词与多分支特征提取,实现全局布局与局部细节的协同推理,确保物理合理性与语义一致性。这使得AeroScene特别适用于生成面向空中机器人任务(如导航、着陆与停靠)的逼真场景。通过在我们新收集的数据集及公开基准上的大量实验,我们证明AeroScene显著优于现有方法。此外,我们利用AeroScene生成了包含超过1000个可直接集成至NVIDIA Isaac Sim的物理就绪高保真3D场景的大规模数据集。最后,我们在下游无人机导航任务中展示了这些生成环境的实用性。代码与数据集已开源发布于aioz-ai.github.io/AeroScene/。