Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation - 专知论文

会员服务 ·

0

估计/估计量 · 有偏 · MoDELS · 语音识别 · 语言模型化 ·

2023 年 5 月 5 日

Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

翻译：掩蔽偏差：基于CTC的ASR通过内部语言模型估计提升域自适应泛化能力

Nilaksh Das,Monica Sunkara,Sravan Bodapati,Jinglun Cai,Devang Kulshreshtha,Jeff Farris,Katrin Kirchhoff

from arxiv, Accepted to ICASSP 2023

End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture, and eliminating the acoustic input to perform log-linear interpolation with the text-only posterior. However, for CTC-based ASR, it is not as straightforward to decouple the model into such acoustic and language components, as CTC log-posteriors are computed in a non-autoregressive manner. In this work, we propose a novel ILME technique for CTC-based ASR models. Our method iteratively masks the audio timesteps to estimate a pseudo log-likelihood of the internal LM by accumulating log-posteriors for only the masked timesteps. Extensive evaluation across multiple out-of-domain datasets reveals that the proposed approach improves WER by up to 9.8% and OOV F1-score by up to 24.6% relative to Shallow Fusion, when only text data from target domain is available. In the case of zero-shot domain adaptation, with no access to any target domain data, we demonstrate that removing the source domain bias with ILME can still outperform Shallow Fusion to improve WER by up to 9.3% relative.

翻译：端到端自动语音识别（ASR）模型在大规模数据上训练时，会隐式地偏向于训练数据的语言语义。已有研究提出内部语言模型估计（ILME）方法，用于减轻自回归模型（如注意力编码器-解码器与RNN-T）中的此类偏差。通常，ILME通过将模型架构中的声学与语言组件模块化，并消除声学输入以执行基于纯文本后验的对数线性插值来实现。然而，对于基于CTC的ASR模型而言，由于其对数后验概率以非自回归方式计算，将模型解耦为声学与语言组件并不直接。本文提出了一种面向CTC型ASR模型的新型ILME技术。该方法通过迭代掩蔽音频时间步，仅累加被掩蔽时间步的对数后验概率，以估计内部语言模型的伪对数似然。在多个跨域数据集上的广泛评估表明：当仅能获取目标域文本数据时，所提方法相比浅融合（Shallow Fusion）可降低最多9.8%的词错误率（WER），并提升最多24.6%的OOV F1分数；在零样本域自适应场景下（即无法获取任何目标域数据），使用ILME消除源域偏差仍能优于浅融合，实现最多9.3%的相对WER降低。

0

相关内容

估计/估计量

估计/估计量

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

2-十三烷酮诱导的棉铃虫HaTrf基因调控细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

同型半胱氨酸经组蛋白和DNA甲基化相互作用调控ERO1α促内质网应激的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

CoFe2O4/BaSrTiO3复合势垒多铁隧道结的制备及隧穿特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

煤/聚乙烯亚胺交联复合螯合吸附剂制备及其对重金属离子的协同作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻三萜代谢物多样性的进化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

hTERT启动子和缺氧诱导因子-1双调控肿瘤特异性溶瘤腺病毒介导的超抗原SEA基因靶向灌注治疗膀胱癌

国家自然科学基金

0+阅读 · 2009年12月31日

复合污染条件下DOM对典型离子性抗生素吸附迁移行为的影响

国家自然科学基金

0+阅读 · 2008年12月31日

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Arxiv

0+阅读 · 2023年6月21日

A unified analysis of likelihood-based estimators in the Plackett--Luce model

Arxiv

0+阅读 · 2023年6月20日

Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding

Arxiv

0+阅读 · 2023年6月19日

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Arxiv

1+阅读 · 2023年6月18日

Adversaries with Limited Information in the Friedkin--Johnsen Model

Arxiv

0+阅读 · 2023年6月17日

Adaptive Strategies in Non-convex Optimization

Arxiv

0+阅读 · 2023年6月17日

High-Dimensional MR Reconstruction Integrating Subspace and Adaptive Generative Models

Arxiv

0+阅读 · 2023年6月16日

Improving Audio Caption Fluency with Automatic Error Correction

Arxiv

0+阅读 · 2023年6月16日

Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

Arxiv

0+阅读 · 2023年6月15日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

VIP会员

文章信息

相关主题

估计/估计量

语言模型化

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

3+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Arxiv

0+阅读 · 2023年6月21日

A unified analysis of likelihood-based estimators in the Plackett--Luce model

Arxiv

0+阅读 · 2023年6月20日

Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding

Arxiv

0+阅读 · 2023年6月19日

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Arxiv

1+阅读 · 2023年6月18日

Adversaries with Limited Information in the Friedkin--Johnsen Model

Arxiv

0+阅读 · 2023年6月17日

Adaptive Strategies in Non-convex Optimization

Arxiv

0+阅读 · 2023年6月17日

High-Dimensional MR Reconstruction Integrating Subspace and Adaptive Generative Models

Arxiv

0+阅读 · 2023年6月16日

Improving Audio Caption Fluency with Automatic Error Correction

Arxiv

0+阅读 · 2023年6月16日

Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

Arxiv

0+阅读 · 2023年6月15日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

相关基金

2-十三烷酮诱导的棉铃虫HaTrf基因调控细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

同型半胱氨酸经组蛋白和DNA甲基化相互作用调控ERO1α促内质网应激的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

CoFe2O4/BaSrTiO3复合势垒多铁隧道结的制备及隧穿特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

煤/聚乙烯亚胺交联复合螯合吸附剂制备及其对重金属离子的协同作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻三萜代谢物多样性的进化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

hTERT启动子和缺氧诱导因子-1双调控肿瘤特异性溶瘤腺病毒介导的超抗原SEA基因靶向灌注治疗膀胱癌

国家自然科学基金

0+阅读 · 2009年12月31日

复合污染条件下DOM对典型离子性抗生素吸附迁移行为的影响

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员