Generative models have shown substantial impact across multiple domains, their potential for scene synthesis remains underexplored in robotics. This gap is more evident in drone simulators, where simulation environments still rely heavily on manual efforts, which are time-consuming to create and difficult to scale. In this work, we introduce AeroScene, a hierarchical diffusion model for progressive 3D scene synthesis. Our approach leverages hierarchy-aware tokenization and multi-branch feature extraction to reason across both global layouts and local details, ensuring physical plausibility and semantic consistency. This makes AeroScene particularly suited for generating realistic scenes for aerial robotics tasks such as navigation, landing, and perching. We demonstrate its effectiveness through extensive experiments on our newly collected dataset and a public benchmark, showing that AeroScene significantly outperforms prior methods. Furthermore, we use AeroScene to generate a large-scale dataset of over 1,000 physics-ready, high fidelity 3D scenes that can be directly integrated into NVIDIA Isaac Sim. Finally, we illustrate the utility of these generated environments on downstream drone navigation tasks. Our code and dataset are publicly available at aioz-ai.github.io/AeroScene/
翻译:生成模型已在多个领域展现出显著影响,但其在机器人场景合成中的潜力仍未得到充分探索。这一差距在无人机模拟器中尤为明显——仿真环境仍高度依赖人工构建,不仅耗时且难以扩展。本文提出AeroScene,一种用于渐进式三维场景合成的分层扩散模型。该方法通过层级感知标记化与多分支特征提取,同时推理全局布局与局部细节,确保物理合理性与语义一致性。这使得AeroScene特别适用于生成面向空中机器人任务(如导航、着陆与栖停)的逼真场景。我们在新收集的数据集与公开基准上通过大量实验证明了其有效性,显示AeroScene显著优于先前方法。此外,我们利用AeroScene生成了包含超过1,000个可直接集成至NVIDIA Isaac Sim的物理就绪高保真三维场景的大规模数据集。最后,我们展示了这些生成环境在下游无人机导航任务中的实用价值。本研究的代码与数据集已在aioz-ai.github.io/AeroScene/公开提供。