Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is pertinent in bioacoustics, where biologists routinely collect extensive sound datasets from the natural environment. In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations. Our experiments showcase that these learned representations exhibit the capacity to generalize to new bird species in few-shot learning (FSL) scenarios. Additionally, we show that selecting windows with high bird activation for self-supervised learning, using a pretrained audio neural network, significantly enhances the quality of the learned representations.
翻译:音频领域的自监督学习(SSL)在多个场景中展现出巨大潜力,尤其是在无标注数据丰富且易于获取的情况下。这一特性在生物声学中尤为重要,因为生物学家通常会从自然环境中系统性地收集大量声音数据集。本研究表明,自监督学习能够从音频记录中获取有意义的鸟类声音表征,而无需依赖人工标注。我们的实验证明,这些学习到的表征在少样本学习(FSL)场景中具备泛化至新鸟种的能力。此外,我们发现利用预训练音频神经网络筛选高鸟类活性窗口用于自监督学习,能够显著提升所学表征的质量。