FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging to achieve good performance without knowing the source of the audio clips during evaluation. To address this, we propose a sound event detection method using domain generalization. Our approach integrates features from bidirectional encoder representations from audio transformers and a convolutional recurrent neural network. We focus on three main strategies to improve our method. First, we apply mixstyle to the frequency dimension to adapt the mel-spectrograms from different domains. Second, we consider training loss of our model specific to each datasets for their corresponding classes. This independent learning framework helps the model extract domain-specific features effectively. Lastly, we use the sound event bounding boxes method for post-processing. Our proposed method shows superior macro-average pAUC and polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation dataset and public evaluation dataset.

翻译：本报告介绍了 Fortemedia Singapore (FMSG) 与环境声音传感联合实验室 (JLESS) 为 DCASE 2024 任务 4 开发并提交的系统。该任务旨在识别音频记录中事件类别及其时间边界，其中可能存在多个事件且事件间可能相互重叠。今年的新颖之处在于采用了包含两个来源的数据集，这使得在评估阶段不知道音频片段来源的情况下，难以获得良好性能。为解决此问题，我们提出了一种利用领域泛化的声音事件检测方法。我们的方法整合了来自音频 Transformer 的双向编码器表征和卷积循环神经网络的特征。我们主要聚焦于三种策略以改进方法。首先，我们在频率维度应用 mixstyle 以适应来自不同域的梅尔频谱图。其次，我们针对各数据集及其对应类别，考虑模型特定的训练损失。这种独立学习框架有助于模型有效提取领域特定特征。最后，我们使用声音事件边界框方法进行后处理。我们提出的方法在 DCASE 2024 挑战赛任务 4 的验证数据集和公开评估数据集上，展现了优异的宏平均 pAUC 和复调 SED 分数性能。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日