1. Obtaining data to train robust artificial intelligence (AI)-based models for species classification can be challenging, particularly for rare species. Data augmentation can boost classification accuracy by increasing the diversity of training data and is cheaper to obtain than expert-labelled data. However, many classic image-based augmentation techniques are not suitable for audio spectrograms. 2. We investigate two generative AI models as data augmentation tools to synthesise spectrograms and supplement audio data: Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs). The latter performed particularly well in terms of both realism of generated spectrograms and accuracy in a resulting classification task. 3. Alongside these new approaches, we present a new audio data set of 640 hours of bird calls from wind farm sites in Ireland, approximately 800 samples of which have been labelled by experts. Wind farm data are particularly challenging for classification models given the background wind and turbine noise. 4. Training an ensemble of classification models on real and synthetic data combined gave 92.6% accuracy (and 90.5% with just the real data) when compared with highly confident BirdNET predictions. 5. Our approach can be used to augment acoustic signals for more species and other land-use types, and has the potential to bring about a step-change in our capacity to develop reliable AI-based detection of rare species. Our code is available at https://github.com/gibbona1/ SpectrogramGenAI.
翻译:1. 获取用于训练稳健人工智能物种分类模型的数据具有挑战性,对稀有物种尤为如此。数据增强可通过增加训练数据多样性提升分类准确率,且成本低于专家标注数据。然而,许多经典的图像增强技术并不适用于音频频谱图。2. 本研究探索了两种生成式人工智能模型作为频谱图合成与音频数据补充的数据增强工具:辅助分类器生成对抗网络与去噪扩散概率模型。后者在生成频谱图的真实度及下游分类任务准确率方面表现尤为突出。3. 除这些新方法外,我们提出了一个包含爱尔兰风电场站点640小时鸟鸣录音的新音频数据集,其中约800个样本已获专家标注。鉴于背景风噪与涡轮机噪声,风电场数据对分类模型构成特殊挑战。4. 在真实数据与合成数据组合上训练分类模型集成,相较于高置信度的BirdNET预测,获得了92.6%的准确率(仅使用真实数据时为90.5%)。5. 本方法可扩展至更多物种及其他土地利用类型的声学信号增强,有望推动基于人工智能的稀有物种可靠检测能力实现阶跃式发展。代码已开源:https://github.com/gibbona1/SpectrogramGenAI。