Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is particularly pertinent in bioacoustics, where biologists routinely collect extensive sound datasets from the natural environment. In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations. Our experiments showcase that these learned representations exhibit the capacity to generalize to new bird species in few-shot learning (FSL) scenarios. Additionally, we show that selecting windows with high bird activation for self-supervised learning, using a pretrained audio neural network, significantly enhances the quality of the learned representations.
翻译:自监督学习在音频领域具有巨大潜力,尤其在大量无标签数据可免费获取的场景中。这一特性在生物声学中尤为突出——生物学家常从自然环境中系统采集海量声音数据集。本研究表明,自监督学习无需标注即可从音频记录中提取鸟类声音的有效表征。实验证明,这些表征在少样本学习场景中具备向新鸟种泛化的能力。此外,我们发现通过预训练音频神经网络筛选高鸟类激活窗口进行自监督学习,能显著提升所学表征的质量。