Multitask frame-level learning for few-shot sound event detection - 专知论文

会员服务 ·

0

小样本学习 · Learning · F分数 · 线性的 · 稳健性 ·

2024 年 3 月 17 日

Multitask frame-level learning for few-shot sound event detection

翻译：多任务帧级学习用于少样本声音事件检测

Liang Zou,Genwei Yan,Ruoyu Wang,Jun Du,Meng Lei,Tian Gao,Xin Fang

from arxiv, 6 pages, 4 figures, conference

This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been proposed to overcome these limitations, these strategies commonly face difficulties with prediction truncation caused by background noise. To alleviate this issue, we introduces an innovative multitask frame-level SED framework. In addition, we introduce TimeFilterAug, a linear timing mask for data augmentation, to increase the model's robustness and adaptability to diverse acoustic environments. The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2023.

翻译：本文聚焦于少样本声音事件检测任务，旨在利用有限的样本自动识别和分类声音事件。然而，当前少样本声音事件检测的主流方法主要依赖于片段级预测，其提供的细粒度预测往往不够精准，尤其是对于短时事件。尽管已有研究提出帧级预测策略以克服上述局限性，但这类策略常因背景噪声导致的预测截断问题而面临困难。为缓解该问题，本文提出了一种创新的多任务帧级声音事件检测框架。此外，我们引入TimeFilterAug——一种线性时序掩码数据增强方法，以提升模型对不同声学环境的鲁棒性与适应性。所提方法在2023年声场景与事件检测与分类挑战赛的少样本生物声学事件检测类别中取得了63.8%的F值，排名第一。

0

相关内容

小样本学习

小样本学习

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【AAAI2022】面向多标签分类的端到端概率标签特征学习

【AAAI2022】面向多标签分类的端到端概率标签特征学习

专知会员服务

32+阅读 · 2022年1月27日

【KDD2021】多层次领域知识在分子图上的对比学习

专知会员服务

39+阅读 · 2021年6月13日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

RL解决'BipedalWalkerHardcore-v2' (SOTA)

RL解决'BipedalWalkerHardcore-v2' (SOTA)

CreateAMind

31+阅读 · 2019年7月17日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 使用变分推理做KBQA

论文浅尝 | 使用变分推理做KBQA

开放知识图谱

13+阅读 · 2018年4月15日

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

科技创新与创业

17+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

MEMS数字地震检波器专用DSP芯片优化设计

国家自然科学基金

1+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

SHVC质量可伸缩视频编码的快速算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

Integral equation methods for acoustic scattering by fractals

Arxiv

0+阅读 · 2024年4月30日

Self-supervised learning for classifying paranasal anomalies in the maxillary sinus

Arxiv

0+阅读 · 2024年4月29日

Structure-preserving particle methods for the Landau collision operator using the metriplectic framework

Arxiv

0+阅读 · 2024年4月29日

Estimation of uncertainties in the density driven flow in fractured porous media using MLMC

Arxiv

0+阅读 · 2024年4月27日

An automatic mixing speech enhancement system for multi-track audio

Arxiv

0+阅读 · 2024年4月27日

Separation capacity of linear reservoirs with random connectivity matrix

Arxiv

0+阅读 · 2024年4月26日

A machine-learning approach to thunderstorm forecasting through post-processing of simulation data

Arxiv

0+阅读 · 2024年4月26日

Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection

Arxiv

0+阅读 · 2024年4月26日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

Few-shot acoustic event detection via meta-learning

Arxiv

26+阅读 · 2020年2月21日

VIP会员

文章信息

相关主题

小样本学习

最新内容

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

0+阅读 · 今天15:20

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

0+阅读 · 今天15:18

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

8+阅读 · 今天5:53

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

4+阅读 · 今天5:45

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

2+阅读 · 今天5:23

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

2+阅读 · 今天5:11

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

6+阅读 · 今天5:04

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

5+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

9+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

10+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

相关VIP内容

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【AAAI2022】面向多标签分类的端到端概率标签特征学习

【AAAI2022】面向多标签分类的端到端概率标签特征学习

专知会员服务

32+阅读 · 2022年1月27日

【KDD2021】多层次领域知识在分子图上的对比学习

专知会员服务

39+阅读 · 2021年6月13日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

美空军新型反无人机部队初探

博士论文 | 面向大模型推理的内存高效算法

《无人系统互操作性导论——无人系统联合架构（JAUS）》

相关资讯

RL解决'BipedalWalkerHardcore-v2' (SOTA)

RL解决'BipedalWalkerHardcore-v2' (SOTA)

CreateAMind

31+阅读 · 2019年7月17日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 使用变分推理做KBQA

论文浅尝 | 使用变分推理做KBQA

开放知识图谱

13+阅读 · 2018年4月15日

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

科技创新与创业

17+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Integral equation methods for acoustic scattering by fractals

Arxiv

0+阅读 · 2024年4月30日

Self-supervised learning for classifying paranasal anomalies in the maxillary sinus

Arxiv

0+阅读 · 2024年4月29日

Structure-preserving particle methods for the Landau collision operator using the metriplectic framework

Arxiv

0+阅读 · 2024年4月29日

Estimation of uncertainties in the density driven flow in fractured porous media using MLMC

Arxiv

0+阅读 · 2024年4月27日

An automatic mixing speech enhancement system for multi-track audio

Arxiv

0+阅读 · 2024年4月27日

Separation capacity of linear reservoirs with random connectivity matrix

Arxiv

0+阅读 · 2024年4月26日

A machine-learning approach to thunderstorm forecasting through post-processing of simulation data

Arxiv

0+阅读 · 2024年4月26日

Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection

Arxiv

0+阅读 · 2024年4月26日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

Few-shot acoustic event detection via meta-learning

Arxiv

26+阅读 · 2020年2月21日

相关基金

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

MEMS数字地震检波器专用DSP芯片优化设计

国家自然科学基金

1+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

SHVC质量可伸缩视频编码的快速算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员