This work explores class-incremental learning (CIL) for sound event detection (SED), advancing adaptability towards real-world scenarios. CIL's success in domains like computer vision inspired our SED-tailored method, addressing the unique challenges of diverse and complex audio environments. Our approach employs an independent unsupervised learning framework with a distillation loss function to integrate new sound classes while preserving the SED model consistency across incremental tasks. We further enhance this framework with a sample selection strategy for unlabeled data and a balanced exemplar update mechanism, ensuring varied and illustrative sound representations. Evaluating various continual learning methods on the DCASE 2023 Task 4 dataset, we find that our research offers insights into each method's applicability for real-world SED systems that can have newly added sound classes. The findings also delineate future directions of CIL in dynamic audio settings.
翻译:本研究探索了面向声音事件检测(SED)的类增量学习(CIL),以提升模型在真实场景中的适应能力。CIL在计算机视觉等领域的成功启发了我们针对SED的定制化方法,该方法旨在应对多样且复杂音频环境带来的独特挑战。我们的方法采用独立的无监督学习框架,并结合蒸馏损失函数,以在整合新声音类别的同时,保持SED模型在增量任务间的一致性。我们进一步通过针对未标记数据的样本选择策略和平衡的样本更新机制来增强该框架,确保获得多样且具有代表性的声音表征。在DCASE 2023任务4数据集上评估多种持续学习方法后,我们发现本研究为每种方法在可能新增声音类别的真实世界SED系统中的适用性提供了见解。研究结果也勾勒出CIL在动态音频环境中的未来发展方向。