Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding - 专知论文

会员服务 ·

0

INFORMS · 多峰值 · 语音识别 · 解码 · Performer ·

2023 年 5 月 23 日

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

翻译：基于声学与语义协同解码的多模态视角重新审视语音识别

Tian-Hao Zhang,Hai-Bo Qin,Zhi-Hao Lai,Song-Lu Chen,Qi Liu,Feng Chen,Xinyuan Qian,Xu-Cheng Yin

from arxiv, Accepted by Interspeech 2023

Attention-based encoder-decoder (AED) models have shown impressive performance in ASR. However, most existing AED methods neglect to simultaneously leverage both acoustic and semantic features in decoder, which is crucial for generating more accurate and informative semantic states. In this paper, we propose an Acoustic and Semantic Cooperative Decoder (ASCD) for ASR. In particular, unlike vanilla decoders that process acoustic and semantic features in two separate stages, ASCD integrates them cooperatively. To prevent information leakage during training, we design a Causal Multimodal Mask. Moreover, a variant Semi-ASCD is proposed to balance accuracy and computational cost. Our proposal is evaluated on the publicly available AISHELL-1 and aidatatang_200zh datasets using Transformer, Conformer, and Branchformer as encoders, respectively. The experimental results show that ASCD significantly improves the performance by leveraging both the acoustic and semantic information cooperatively.

翻译：基于注意力机制的编码器-解码器（AED）模型在自动语音识别（ASR）中展现出卓越性能。然而，现有大多数AED方法未能同时在解码器中充分利用声学与语义特征，而这对于生成更准确且富含信息的语义状态至关重要。本文提出一种面向ASR的声学与语义协同解码器（ASCD）。具体而言，与将声学与语义特征分两阶段处理的传统解码器不同，ASCD将其进行协同整合。为防止训练过程中的信息泄露，我们设计了因果多模态掩码。此外，为平衡准确性与计算成本，我们提出了变体模型Semi-ASCD。该方案分别在AISHELL-1及aidatatang_200zh公开数据集上，以Transformer、Conformer和Branchformer作为编码器进行评估。实验结果表明，ASCD通过协同利用声学与语义信息显著提升了系统性能。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

miR-29b/STAT3/GATA3正反馈环路逆转TAM极化并抑制尤文肉瘤恶性表型的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

ICOSL诱导三阴性乳腺癌转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于靶向多肽探针的循环肿瘤细胞高选择性分析探索

国家自然科学基金

0+阅读 · 2014年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CD4+CD25+Foxp3+调节T细胞介导的急性髓系白血病免疫逃逸机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

拟南芥Argonaute1在细胞核内调控基因表达的机制

国家自然科学基金

0+阅读 · 2013年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

新BRCA1剪接异构体在乳腺癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

On the Effectiveness of Speech Self-supervised Learning for Music

Arxiv

0+阅读 · 2023年7月11日

Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels

Arxiv

0+阅读 · 2023年7月11日

Semantic-SAM: Segment and Recognize Anything at Any Granularity

Arxiv

0+阅读 · 2023年7月10日

Hyperspectral and Multispectral Image Fusion Using the Conditional Denoising Diffusion Probabilistic Model

Arxiv

0+阅读 · 2023年7月7日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

最新内容

“史诗怒火”行动：现代多域作战的重要节点

“史诗怒火”行动：现代多域作战的重要节点

专知会员服务

3+阅读 · 今天5:05

《下一代无线网络中的多无人机通信资源管理》

《下一代无线网络中的多无人机通信资源管理》

专知会员服务

3+阅读 · 今天5:00

《高分辨率模拟下的聚合战斗建模：以“会战交锋”场景为例》

《高分辨率模拟下的聚合战斗建模：以“会战交锋”场景为例》

专知会员服务

3+阅读 · 今天4:52

《人机协同在安全关键型操作决策中的应用》120页

《人机协同在安全关键型操作决策中的应用》120页

专知会员服务

2+阅读 · 今天4:43

网络防御与空中力量网络防护：21世纪空中力量历史与理论的启示

网络防御与空中力量网络防护：21世纪空中力量历史与理论的启示

专知会员服务

2+阅读 · 今天1:47

综述 | Memory for Large Language Models：大模型记忆机制全景

综述 | Memory for Large Language Models：大模型记忆机制全景

专知会员服务

6+阅读 · 7月29日

博士论文 | Riemannian Deep Learning：模块、网络与几何

博士论文 | Riemannian Deep Learning：模块、网络与几何

专知会员服务

2+阅读 · 7月29日

《越野作战环境下路径规划的多准则整数规划模型》

《越野作战环境下路径规划的多准则整数规划模型》

专知会员服务

9+阅读 · 7月29日

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

专知会员服务

7+阅读 · 7月29日

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

专知会员服务

6+阅读 · 7月29日

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

专知会员服务

10+阅读 · 7月29日

《同步多无人机系统中的故障与通信》

《同步多无人机系统中的故障与通信》

专知会员服务

4+阅读 · 7月29日

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

5+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

11+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

7+阅读 · 7月28日

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《下一代无线网络中的多无人机通信资源管理》

《人机协同在安全关键型操作决策中的应用》120页

“史诗怒火”行动：现代多域作战的重要节点

《高分辨率模拟下的聚合战斗建模：以“会战交锋”场景为例》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

相关论文

On the Effectiveness of Speech Self-supervised Learning for Music

Arxiv

0+阅读 · 2023年7月11日

Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels

Arxiv

0+阅读 · 2023年7月11日

Semantic-SAM: Segment and Recognize Anything at Any Granularity

Arxiv

0+阅读 · 2023年7月10日

Hyperspectral and Multispectral Image Fusion Using the Conditional Denoising Diffusion Probabilistic Model

Arxiv

0+阅读 · 2023年7月7日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

相关基金

miR-29b/STAT3/GATA3正反馈环路逆转TAM极化并抑制尤文肉瘤恶性表型的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

ICOSL诱导三阴性乳腺癌转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于靶向多肽探针的循环肿瘤细胞高选择性分析探索

国家自然科学基金

0+阅读 · 2014年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CD4+CD25+Foxp3+调节T细胞介导的急性髓系白血病免疫逃逸机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

拟南芥Argonaute1在细胞核内调控基因表达的机制

国家自然科学基金

0+阅读 · 2013年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

新BRCA1剪接异构体在乳腺癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员