Most work in audio enhancement targets human speech, while bioacoustics is less studied due to noisy recordings and the distinct traits of animal sounds. To fill this gap, we adapt speech enhancement methods and build BioSEN, a model made for bioacoustic signals. BioSEN has three modules: a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an energy-adaptive gating connection unit that uses frequency weights to keep vocalizations from being removed as noise. Tests on three bioacoustic datasets show that BioSEN matches or exceeds state-of-the-art speech enhancement models while using far less computation. These results show BioSEN's strength for bioacoustic audio enhancement and its promise for biodiversity monitoring and conservation.
翻译:大多数音频增强工作针对人类语音,而由于噪音录音和动物声音的独特特征,生物声学领域研究相对较少。为填补这一空白,我们适应性调整语音增强方法,构建了专用于生物声学信号的模型BioSEN。该模型包含三个模块:用于时频特征提取的多尺度双轴注意力单元、用于捕捉谐波结构的生物谐波多尺度增强单元,以及利用频率权重防止叫声被作为噪声去除的能量自适应门控连接单元。在三个生物声学数据集上的测试表明,BioSEN在计算量大幅降低的情况下,达到或超越了现有最优语音增强模型。这些结果验证了BioSEN在生物声学音频增强中的优势,及其在生物多样性监测与保护中的潜力。