This work aims to advance sound event detection (SED) research by presenting a new large language model (LLM)-powered dataset namely wild domestic environment sound event detection (WildDESED). It is crafted as an extension to the original DESED dataset to reflect diverse acoustic variability and complex noises in home settings. We leveraged LLMs to generate eight different domestic scenarios based on target sound categories of the DESED dataset. Then we enriched the scenarios with a carefully tailored mixture of noises selected from AudioSet and ensured no overlap with target sound. We consider widely popular convolutional neural recurrent network to study WildDESED dataset, which depicts its challenging nature. We then apply curriculum learning by gradually increasing noise complexity to enhance the model's generalization capabilities across various noise levels. Our results with this approach show improvements within the noisy environment, validating the effectiveness on the WildDESED dataset promoting noise-robust SED advancements.
翻译:本研究旨在通过提出一种新的大型语言模型(LLM)驱动的数据集——野外家居环境声音事件检测(WildDESED),推动声音事件检测(SED)研究的发展。该数据集是在原始DESED数据集基础上构建的扩展,旨在反映家庭环境中多样的声学变异性和复杂噪声。我们利用LLM基于DESED数据集的目标声音类别生成了八种不同的家居场景。随后,我们从AudioSet中精心挑选了噪声混合物来丰富这些场景,并确保这些噪声与目标声音无重叠。我们采用广泛流行的卷积神经循环网络来研究WildDESED数据集,这揭示了其挑战性本质。接着,我们应用课程学习方法,通过逐步增加噪声复杂性来增强模型在不同噪声水平下的泛化能力。采用此方法得到的结果表明,在噪声环境下的性能有所提升,验证了该方法在WildDESED数据集上的有效性,从而推动了噪声鲁棒性SED技术的进步。