基于生成式人工智能的数据增强方法用于提升嘈杂环境中的生物声学分类性能 (Generative AI-based data augmentation for improved bioacoustic classification in noisy environments)

1. Obtaining data to train robust artificial intelligence (AI)-based models for species classification can be challenging, particularly for rare species. Data augmentation can boost classification accuracy by increasing the diversity of training data and is cheaper to obtain than expert-labelled data. However, many classic image-based augmentation techniques are not suitable for audio spectrograms. 2. We investigate two generative AI models as data augmentation tools to synthesise spectrograms and supplement audio data: Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs). The latter performed particularly well in terms of both realism of generated spectrograms and accuracy in a resulting classification task. 3. Alongside these new approaches, we present a new audio data set of 640 hours of bird calls from wind farm sites in Ireland, approximately 800 samples of which have been labelled by experts. Wind farm data are particularly challenging for classification models given the background wind and turbine noise. 4. Training an ensemble of classification models on real and synthetic data combined gave 92.6% accuracy (and 90.5% with just the real data) when compared with highly confident BirdNET predictions. 5. Our approach can be used to augment acoustic signals for more species and other land-use types, and has the potential to bring about a step-change in our capacity to develop reliable AI-based detection of rare species. Our code is available at https://github.com/gibbona1/ SpectrogramGenAI.

翻译：1. 获取用于训练稳健人工智能物种分类模型的数据具有挑战性，对稀有物种尤为如此。数据增强可通过增加训练数据多样性提升分类准确率，且成本低于专家标注数据。然而，许多经典的图像增强技术并不适用于音频频谱图。2. 本研究探索了两种生成式人工智能模型作为频谱图合成与音频数据补充的数据增强工具：辅助分类器生成对抗网络与去噪扩散概率模型。后者在生成频谱图的真实度及下游分类任务准确率方面表现尤为突出。3. 除这些新方法外，我们提出了一个包含爱尔兰风电场站点640小时鸟鸣录音的新音频数据集，其中约800个样本已获专家标注。鉴于背景风噪与涡轮机噪声，风电场数据对分类模型构成特殊挑战。4. 在真实数据与合成数据组合上训练分类模型集成，相较于高置信度的BirdNET预测，获得了92.6%的准确率（仅使用真实数据时为90.5%）。5. 本方法可扩展至更多物种及其他土地利用类型的声学信号增强，有望推动基于人工智能的稀有物种可靠检测能力实现阶跃式发展。代码已开源：https://github.com/gibbona1/SpectrogramGenAI。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《用于无线通信和传感的智能反射面 (IRS)》（ICC 2022）新加坡国立大学2022最新53页slides

专知会员服务

25+阅读 · 2022年11月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日