We present a novel approach to automatically detect and classify great ape calls from continuous raw audio recordings collected during field research. Our method leverages deep pretrained and sequential neural networks, including wav2vec 2.0 and LSTM, and is validated on three data sets from three different great ape lineages (orangutans, chimpanzees, and bonobos). The recordings were collected by different researchers and include different annotation schemes, which our pipeline preprocesses and trains in a uniform fashion. Our results for call detection and classification attain high accuracy. Our method is aimed to be generalizable to other animal species, and more generally, sound event detection tasks. To foster future research, we make our pipeline and methods publicly available.
翻译:我们提出了一种新颖的方法,用于从田野研究期间采集的连续原始音频录音中自动检测和分类大猿叫声。我们的方法利用了深度预训练和序列神经网络,包括wav2vec 2.0和LSTM,并在来自三种不同大猿谱系(猩猩、黑猩猩和倭黑猩猩)的三个数据集上进行了验证。这些录音由不同的研究人员采集,并包含不同的标注方案,我们的流程以统一方式对其进行预处理和训练。在叫声检测和分类方面,我们的结果达到了高准确率。我们的方法旨在推广到其他动物物种,以及更一般的声学事件检测任务。为促进未来研究,我们公开提供了流程和方法的代码。