Automated bioacoustic analysis is essential for biodiversity monitoring and conservation, requiring advanced deep learning models that can adapt to diverse bioacoustic tasks. This article presents a comprehensive review of large-scale pretrained bioacoustic foundation models and systematically investigates their transferability across multiple bioacoustic classification tasks. We overview bioacoustic representation learning by analysing pretraining data sources and benchmarks. On this basis, we review bioacoustic foundation models, dissecting the models' training data, preprocessing, augmentations, architecture, and training paradigm. Additionally, we conduct an extensive empirical study of selected models on the BEANS and BirdSet benchmarks, evaluating generalisability under linear and attentive probing. Our experimental analysis reveals that Perch~2.0 achieves the highest BirdSet score (restricted evaluation) and the strongest linear probing result on BEANS, building on diverse multi-taxa supervised pretraining; that BirdMAE is the best model among probing-based strategies on BirdSet and second on BEANS after BEATs$_{NLM}$, the encoder of NatureLM-audio; that attentive probing is beneficial to extract the full performance of transformer-based models; and that general-purpose audio models trained with self-supervised learning on AudioSet outperform many specialised bird sound models on BEANS when evaluated with attentive probing. These findings provide valuable guidance for practitioners selecting appropriate models to adapt them to new bioacoustic classification tasks via probing.
翻译:自动化生物声学分析对于生物多样性监测与保护至关重要,需要能够适应多样化生物声学任务的先进深度学习模型。本文全面回顾了大规模预训练的生物声学基础模型,并系统探究了它们在多类生物声学分类任务中的可迁移性。通过分析预训练数据源与基准测试,我们概述了生物声学表示学习。在此基础上,我们梳理了生物声学基础模型,深入剖析了其训练数据、预处理、数据增强、架构及训练范式。此外,我们基于BEANS与BirdSet基准对选定模型开展了广泛的实证研究,在线性与注意力探测下评估其泛化能力。实验分析揭示:Perch~2.0凭借多样化的多类群监督预训练,在BirdSet(受限评估)中获得最高得分,并在BEANS上取得最强的线性探测结果;BirdMAE是BirdSet上基于探测策略的最佳模型,在BEANS上仅次于NatureLM-audio的编码器BEATs$_{NLM}$;注意力探测有利于充分挖掘基于Transformer模型的性能;在BEANS上采用注意力探测评估时,通过自监督学习在AudioSet上训练的通用音频模型优于许多专用鸟类声音模型。这些发现为实践者选择适配模型以通过探测适应新生物声学分类任务提供了宝贵指导。