The ongoing biodiversity crisis, driven by factors such as land-use change and global warming, emphasizes the need for effective ecological monitoring methods. Acoustic monitoring of biodiversity has emerged as an important monitoring tool. Detecting human voices in soundscape monitoring projects is useful both for analysing human disturbance and for privacy filtering. Despite significant strides in deep learning in recent years, the deployment of large neural networks on compact devices poses challenges due to memory and latency constraints. Our approach focuses on leveraging knowledge distillation techniques to design efficient, lightweight student models for speech detection in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to create compact yet effective student architectures to compare against the larger EcoVAD teacher model, a well-regarded voice detection architecture in eco-acoustic monitoring. The comparative analysis included examining various configurations of the MobileNetV3-Small-Pi derived student models to identify optimal performance. Additionally, a thorough evaluation of different distillation techniques was conducted to ascertain the most effective method for model selection. Our findings revealed that the distilled models exhibited comparable performance to the EcoVAD teacher model, indicating a promising approach to overcoming computational barriers for real-time ecological monitoring.
翻译:持续的生物多样性危机——由土地利用变化和全球变暖等因素驱动——凸显了高效生态监测方法的必要性。声学生物多样性监测已发展成为重要的监测手段。在声景监测项目中检测人声,既有助于分析人类干扰活动,也能实现隐私过滤。尽管近年来深度学习取得了显著进展,但由于内存和延迟限制,在紧凑型设备上部署大型神经网络仍面临挑战。我们的方法侧重于利用知识蒸馏技术,为生物声学中的语音检测设计高效轻量的学生模型。具体而言,我们采用MobileNetV3-Small-Pi模型构建紧凑而高效的学生架构,与生态声学监测中备受认可的语音检测模型——更大型的EcoVAD教师模型进行对比。对比分析包括检验基于MobileNetV3-Small-Pi衍生出的各种学生模型配置,以确定最优性能表现。此外,我们对不同蒸馏技术进行了全面评估,以确定模型选择的最有效方法。研究结果表明,蒸馏模型展现出与EcoVAD教师模型相当的性能,这为克服实时生态监测的计算瓶颈提供了具有前景的解决方案。