Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to assess environmental health. To maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), DL models must analyze bird vocalizations across a wide range of species and environmental conditions. However, data fragmentation challenges a comprehensive evaluation of generalization performance. Therefore, we introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing. Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.
翻译:深度学习模型已成为评估环境健康的有力工具。为充分发挥经济高效且侵入性最小的被动声学监测的潜力,深度学习模型必须分析广泛物种和环境条件下的鸟类发声。然而,数据碎片化问题对泛化性能的综合评估构成了挑战。为此,我们引入了BirdSet数据集,该数据集包含约52万条全球鸟类录音用于训练,以及超过400小时的被动声学监测录音用于测试。我们的基准为多种深度学习模型提供了基线,以增强可比性并整合跨研究的工作,同时提供了包含完整训练与评估协议的代码实现。