The scarcity of labelled data makes training Deep Neural Network (DNN) models in bioacoustic applications challenging. In typical bioacoustics applications, manually labelling the required amount of data can be prohibitively expensive. To effectively identify both new and current classes, DNN models must continue to learn new features from a modest amount of fresh data. Active Learning (AL) is an approach that can help with this learning while requiring little labelling effort. Nevertheless, the use of fixed feature extraction approaches limits feature quality, resulting in underutilization of the benefits of AL. We describe an AL framework that addresses this issue by incorporating feature extraction into the AL loop and refining the feature extractor after each round of manual annotation. In addition, we use raw audio processing rather than spectrograms, which is a novel approach. Experiments reveal that the proposed AL framework requires 14.3%, 66.7%, and 47.4% less labelling effort on benchmark audio datasets ESC-50, UrbanSound8k, and InsectWingBeat, respectively, for a large DNN model and similar savings on a microcontroller-based counterpart. Furthermore, we showcase the practical relevance of our study by incorporating data from conservation biology projects. All codes are publicly available on GitHub.
翻译:在生物声学应用中,标记数据的稀缺性使得训练深度神经网络(DNN)模型具有挑战性。在典型的生物声学应用中,手动标记所需数量的数据成本可能极其高昂。为了有效识别新的和现有的类别,DNN模型必须持续从少量新增数据中学习新特征。主动学习(AL)是一种可以在仅需少量标注工作的情况下辅助此学习过程的方法。然而,固定特征提取方法的使用限制了特征质量,导致未能充分利用AL的优势。我们描述了一个AL框架,通过将特征提取整合到AL循环中并在每轮手动标注后优化特征提取器,从而解决此问题。此外,我们采用原始音频处理而非声谱图,这是一种新颖的方法。实验表明,在基准音频数据集ESC-50、UrbanSound8k和InsectWingBeat上,对于大型DNN模型,所提出的AL框架分别减少了14.3%、66.7%和47.4%的标注工作量,在基于微控制器的对应模型上也实现了类似的节省。此外,我们通过整合保护生物学项目的数据,展示了本研究的实际相关性。所有代码已在GitHub上公开。