Self-supervised learning (SSL) has opened new opportunities in bioacoustics by enabling scalable modeling of animal vocalizations without the need for expensive manual annotation. However, current SSL models in this domain prioritize broad generalization across species and are not optimized for uncovering the fine-grained structure of individual communication systems. In this work, we collect and release a novel dataset of over five years of longitudinal recordings, from five known dolphins in a semi-naturalistic marine environment, an unprecedented resource for studying dolphin communication. We adapt the Wav2Vec2.0 Baevski et al. (2020) architecture to this domain and introduce Dolph2Vec, the first large-scale, species-specific SSL model trained exclusively on this data. We benchmark our model on two biologically relevant tasks: signature whistle classification and whistle detection. Dolph2Vec significantly outperforms general-purpose baselines in both tasks. Beyond performance, we show that learned embeddings and codebook structure capture interpretable acoustic units aligned with dolphin whistle categories and possibly sub-whistle structure, enabling fine-grained analysis of communication patterns. Our findings demonstrate how SSL can serve as both a model and a scientific tool to explore hypotheses in animal communication research.
翻译:自监督学习(SSL)为生物声学领域开辟了新机遇,通过无需昂贵人工标注即可实现动物发声的可扩展建模。然而,当前该领域的SSL模型侧重于跨物种的广泛泛化,并未针对揭示个体通信系统的精细结构进行优化。本研究收集并发布了一个包含五年以上纵向记录的新型数据集,数据来自半自然海洋环境中五只已知身份的海豚,这是研究海豚通信的前所未有资源。我们针对该领域调整了Wav2Vec2.0 Baevski 等人(2020)架构,并首次提出了Dolph2Vec——一种专基于该数据训练的大规模、物种特异性SSL模型。我们在两项生物学相关任务上对模型进行了基准测试:签名哨声分类与哨声检测。Dolph2Vec在这两项任务中均显著优于通用基线模型。除性能外,我们还证明学习到的嵌入向量与码本结构能够捕获与海豚哨声类别及可能的亚哨声结构相对应的可解释声学单元,从而实现对通信模式的细粒度分析。我们的研究结果表明,SSL可作为模型与科学工具,用于探索动物通信研究中的假说。