Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is particularly pertinent in bioacoustics, where biologists routinely collect extensive sound datasets from the natural environment. In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations. Our experiments showcase that these learned representations exhibit the capacity to generalize to new bird species in few-shot learning (FSL) scenarios. Additionally, we show that selecting windows with high bird activation for self-supervised learning, using a pretrained audio neural network, significantly enhances the quality of the learned representations.
翻译:自监督学习(SSL)在音频领域具有显著潜力,尤其适用于可免费获取大量未标注数据的场景。这一特性在生物声学中尤为重要——生物学家通常从自然环境中系统性地收集海量声音数据集。本研究表明,自监督学习能够从录音中学习鸟类声音的有效表征,且无需依赖人工标注。实验证实,这些学习到的表征在少样本学习(FSL)场景中具备泛化至新鸟种的能力。此外,我们还发现,利用预训练的音频神经网络筛选具有高鸟类活性的时间窗口进行自监督学习,能显著提升习得表征的质量。