Efficient speech detection in environmental audio using acoustic recognition and knowledge distillation

The ongoing biodiversity crisis, driven by factors such as land-use change and global warming, emphasizes the need for effective ecological monitoring methods. Acoustic monitoring of biodiversity has emerged as an important monitoring tool. Detecting human voices in soundscape monitoring projects is useful both for analysing human disturbance and for privacy filtering. Despite significant strides in deep learning in recent years, the deployment of large neural networks on compact devices poses challenges due to memory and latency constraints. Our approach focuses on leveraging knowledge distillation techniques to design efficient, lightweight student models for speech detection in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to create compact yet effective student architectures to compare against the larger EcoVAD teacher model, a well-regarded voice detection architecture in eco-acoustic monitoring. The comparative analysis included examining various configurations of the MobileNetV3-Small-Pi derived student models to identify optimal performance. Additionally, a thorough evaluation of different distillation techniques was conducted to ascertain the most effective method for model selection. Our findings revealed that the distilled models exhibited comparable performance to the EcoVAD teacher model, indicating a promising approach to overcoming computational barriers for real-time ecological monitoring.

翻译：持续的生物多样性危机——由土地利用变化和全球变暖等因素驱动——凸显了高效生态监测方法的必要性。声学生物多样性监测已发展成为重要的监测手段。在声景监测项目中检测人声，既有助于分析人类干扰活动，也能实现隐私过滤。尽管近年来深度学习取得了显著进展，但由于内存和延迟限制，在紧凑型设备上部署大型神经网络仍面临挑战。我们的方法侧重于利用知识蒸馏技术，为生物声学中的语音检测设计高效轻量的学生模型。具体而言，我们采用MobileNetV3-Small-Pi模型构建紧凑而高效的学生架构，与生态声学监测中备受认可的语音检测模型——更大型的EcoVAD教师模型进行对比。对比分析包括检验基于MobileNetV3-Small-Pi衍生出的各种学生模型配置，以确定最优性能表现。此外，我们对不同蒸馏技术进行了全面评估，以确定模型选择的最有效方法。研究结果表明，蒸馏模型展现出与EcoVAD教师模型相当的性能，这为克服实时生态监测的计算瓶颈提供了具有前景的解决方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日