The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis - 专知论文

会员服务 ·

0

Performer · Analysis · 多峰值 · Performance · 单峰值 ·

2023 年 3 月 4 日

The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

翻译：DKU 后挑战时代的音视频唤醒词检测系统：2021年MISP挑战赛深度分析

Haoxu Wang,Ming Cheng,Qiang Fu,Ming Li

from arxiv, Accepted by ICASSP 2023

This paper further explores our previous wake word spotting system ranked 2-nd in Track 1 of the MISP Challenge 2021. First, we investigate a robust unimodal approach based on 3D and 2D convolution and adopt the simple attention module (SimAM) for our system to improve performance. Second, we explore different combinations of data augmentation methods for better performance. Finally, we study the fusion strategies, including score-level, cascaded and neural fusion. Our proposed multimodal system leverages multimodal features and uses the complementary visual information to mitigate the performance degradation of audio-only systems in complex acoustic scenarios. Our system obtains a false reject rate of 2.15% and a false alarm rate of 3.44% in the evaluation set of the competition database, which achieves the new state-of-the-art performance by 21% relative improvement compared to previous systems.

翻译：本文深入探讨了我们在2021年MISP挑战赛Track 1中排名第二的唤醒词检测系统。首先，我们研究了一种基于3D和2D卷积的鲁棒单模态方法，并引入简单注意力模块（SimAM）以提升系统性能。其次，我们探索了不同数据增强方法的组合以取得更优效果。最后，我们研究了包括分数级融合、级联融合和神经融合在内的融合策略。所提出的多模态系统利用多模态特征，借助互补的视觉信息来缓解复杂声学场景中纯音频系统的性能退化。该系统在比赛数据库的评测集上实现了2.15%的误拒率和3.44%的误报率，相较于先前系统取得了21%的相对性能提升，达到了新的最优水平。

0

相关内容

Performer

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

钙钛矿型太阳能电池稳定性的机理研究及其光伏器件性能提高

国家自然科学基金

0+阅读 · 2015年12月31日

宽谱敏化的Yb3+掺杂硅(氧)氮化物制备与近红外发光性能

国家自然科学基金

0+阅读 · 2014年12月31日

粉煤灰基沸石CHA制备及其分离工业废气中CO2的基础研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

自组装太阳能电池（二）

国家自然科学基金

0+阅读 · 2012年12月31日

磁性掺杂的哈斯勒半导体类薄膜的制备和反常霍尔效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ag/Bi4Ti3O12-TiO2铁电/半导体异质结纳米管阵列的光伏特性及载流子输运机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

掺杂的TiO2纤维的制备及其光催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

垃圾焚烧过程中多种重金属资源化回收的研究

国家自然科学基金

0+阅读 · 2008年12月31日

p-n复合半导体CoO/CdS敏化TiO2可见光催化分解水制氢

国家自然科学基金

0+阅读 · 2008年12月31日

Learning Partial Correlation based Deep Visual Representation for Image Classification

Arxiv

0+阅读 · 2023年4月26日

Foley Sound Synthesis at the DCASE 2023 Challenge

Arxiv

0+阅读 · 2023年4月26日

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

Arxiv

0+阅读 · 2023年4月25日

Exploring Compositional Visual Generation with Latent Classifier Guidance

Arxiv

0+阅读 · 2023年4月25日

Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices

Arxiv

0+阅读 · 2023年4月23日

VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges

Arxiv

11+阅读 · 2022年12月26日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

最新内容

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

5+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

4+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

4+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

5+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

4+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

相关VIP内容

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

重新思考无人机时代的生存能力

在人工智能加速决策环境中拓展OODA循环

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

装甲突击旅：现代战争思考、战斗与组织

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Learning Partial Correlation based Deep Visual Representation for Image Classification

Arxiv

0+阅读 · 2023年4月26日

Foley Sound Synthesis at the DCASE 2023 Challenge

Arxiv

0+阅读 · 2023年4月26日

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

Arxiv

0+阅读 · 2023年4月25日

Exploring Compositional Visual Generation with Latent Classifier Guidance

Arxiv

0+阅读 · 2023年4月25日

Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices

Arxiv

0+阅读 · 2023年4月23日

VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges

Arxiv

11+阅读 · 2022年12月26日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

相关基金

钙钛矿型太阳能电池稳定性的机理研究及其光伏器件性能提高

国家自然科学基金

0+阅读 · 2015年12月31日

宽谱敏化的Yb3+掺杂硅(氧)氮化物制备与近红外发光性能

国家自然科学基金

0+阅读 · 2014年12月31日

粉煤灰基沸石CHA制备及其分离工业废气中CO2的基础研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

自组装太阳能电池（二）

国家自然科学基金

0+阅读 · 2012年12月31日

磁性掺杂的哈斯勒半导体类薄膜的制备和反常霍尔效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ag/Bi4Ti3O12-TiO2铁电/半导体异质结纳米管阵列的光伏特性及载流子输运机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

掺杂的TiO2纤维的制备及其光催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

垃圾焚烧过程中多种重金属资源化回收的研究

国家自然科学基金

0+阅读 · 2008年12月31日

p-n复合半导体CoO/CdS敏化TiO2可见光催化分解水制氢

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员