One major barrier to advancing aerial autonomy has been collecting large-scale aerial datasets for training machine learning models. Due to costly and time-consuming real-world data collection through deploying drones, there has been an increasing shift towards using synthetic data for training models in drone applications. However, to increase widespread generalization and transferring models to real-world, increasing the diversity of simulation environments to train a model over all the varieties and augmenting the training data, has been proved to be essential. Current synthetic aerial data generation tools either lack data augmentation or rely heavily on manual workload or real samples for configuring and generating diverse realistic simulation scenes for data collection. These dependencies limit scalability of the data generation workflow. Accordingly, there is a major challenge in balancing generalizability and scalability in synthetic data generation. To address these gaps, we introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications. ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations to generate diverse aerial datasets for different training tasks. ASDA improves data generation workflow efficiency by providing a unified prompt-based interface over integrated pipelines for flexible control. The procedural generative approach of our data augmentation is performant and adaptable to different simulation environments, training tasks and data collection needs. We demonstrate the effectiveness of our method in automatically generating diverse datasets and show its potential for downstream performance optimization.
翻译:推进航空自主性的主要障碍之一是收集大规模航空数据集以训练机器学习模型。由于通过部署无人机进行真实世界数据采集成本高昂且耗时,因此在无人机应用领域,使用合成数据训练模型的趋势日益增强。然而,为了提升模型的广泛泛化能力并将其迁移到真实场景,增加模拟环境的多样性以覆盖所有变体并增强训练数据已被证明至关重要。当前的合成航空数据生成工具要么缺乏数据增强功能,要么在配置和生成多样化逼真模拟场景以进行数据采集时严重依赖人工工作或真实样本。这些依赖性限制了数据生成工作流的可扩展性。因此,在合成数据生成中平衡泛化性与可扩展性成为一项重大挑战。为填补这些空白,我们提出了一种针对航空自主应用的可扩展航空合成数据增强(ASDA)框架。ASDA扩展了一个核心数据采集引擎,包含两个可脚本化流水线,可自动执行场景与数据增强,从而生成面向不同训练任务的多样化航空数据集。ASDA通过为集成流水线提供统一的基于提示的接口实现灵活控制,提升了数据生成工作流的效率。我们数据增强的过程化生成方法性能优越,且能适应不同的模拟环境、训练任务与数据采集需求。我们通过自动生成多样化数据集验证了该方法的有效性,并展示了其在后续性能优化中的潜力。