Animal vocalization denoising is a task similar to human speech enhancement, a well-studied field of research. In contrast to the latter, it is applied to a higher diversity of sound production mechanisms and recording environments, and this higher diversity is a challenge for existing models. Adding to the challenge and in contrast to speech, we lack large and diverse datasets comprising clean vocalizations. As a solution we use as training data pseudo-clean targets, i.e. pre-denoised vocalizations, and segments of background noise without a vocalization. We propose a train set derived from bioacoustics datasets and repositories representing diverse species, acoustic environments, geographic regions. Additionally, we introduce a non-overlapping benchmark set comprising clean vocalizations from different taxa and noise samples. We show that that denoising models (demucs, CleanUNet) trained on pseudo-clean targets obtained with speech enhancement models achieve competitive results on the benchmarking set. We publish data, code, libraries, and demos https://mariusmiron.com/research/biodenoising.
翻译:动物发声去噪是一项与人类语音增强相似的任务,后者是一个已被深入研究的领域。与语音增强不同的是,动物发声去噪需要应对更多样的声音产生机制和录音环境,这种更高的多样性对现有模型构成了挑战。更困难的是,与语音数据不同,我们缺乏包含干净发声的大规模多样化数据集。作为解决方案,我们使用伪干净目标(即预去噪的发声片段)以及不含发声的背景噪声片段作为训练数据。我们提出了一个源自生物声学数据集和存储库的训练集,涵盖了不同物种、声学环境和地理区域。此外,我们引入了一个非重叠的基准测试集,包含来自不同分类群的干净发声样本和噪声样本。实验表明,基于语音增强模型生成的伪干净目标训练的降噪模型(如demucs、CleanUNet)在基准测试集上取得了具有竞争力的结果。我们已公开数据、代码、库及演示示例:https://mariusmiron.com/research/biodenoising。