In the field of wildlife observation and conservation, approaches involving machine learning on audio recordings are becoming increasingly popular. Unfortunately, available datasets from this field of research are often not optimal learning material; Samples can be weakly labeled, of different lengths or come with a poor signal-to-noise ratio. In this work, we introduce a generalized approach that first relabels subsegments of MEL spectrogram representations, to achieve higher performances on the actual multi-class classification tasks. For both the binary pre-sorting and the classification, we make use of convolutional neural networks (CNN) and various data-augmentation techniques. We showcase the results of this approach on the challenging \textit{ComparE 2021} dataset, with the task of classifying between different primate species sounds, and report significantly higher Accuracy and UAR scores in contrast to comparatively equipped model baselines.
翻译:在野生动物观察与保护领域,基于音频记录的机器学习方法正日益普及。然而,该研究领域的可用数据集往往并非理想的学习材料:样本可能存在弱标签、长度不一,或信噪比较差。本文提出了一种通用方法,首先对梅尔频谱图表示的子片段进行重新标注,以提高实际多类分类任务的性能。在二元预分类和分类过程中,我们均采用卷积神经网络(CNN)及多种数据增强技术。我们在具有挑战性的\textit{ComparE 2021}数据集上展示了该方法的成果,该数据集的任务是对不同灵长类物种的声音进行分类,并报告了相较于配置相当的模型基线,准确率和UAR评分均得到显著提升。