MAC: A unified framework boosting low resource automatic speech recognition - 专知论文

会员服务 ·

0

语音识别 · MAC · 连结 · 自动语音识别 · Boosting（一种模型训练加速方式） ·

2023 年 2 月 15 日

MAC: A unified framework boosting low resource automatic speech recognition

翻译：MAC：一种提升低资源自动语音识别的统一框架

Zeping Min,Qian Ge,Zhong Li,Weinan E

We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). It is easy to implement and can be carried out in extremely low resource environments. Mathematically, we give a clear description of MAC framework from the perspective of bayesian sampling. In this framework, we leverage a novel concatenative synthesis text-to-speech system to boost the low resource ASR task. By the concatenative synthesis text-to-speech system, we can integrate language pronunciation rules and adjust the TTS process. Furthermore, we propose a broad notion of meta audio set to meet the modeling needs of different languages and different scenes when using the system. Extensive experiments have demonstrated the great effectiveness of MAC on low resource ASR tasks. For CTC greedy search, CTC prefix, attention, and attention rescoring decode mode in Cantonese ASR task, Taiwanese ASR task, and Japanese ASR task the MAC method can reduce the CER by more than 15\%. Furthermore, in the ASR task, MAC beats wav2vec2 (with fine-tuning) on common voice datasets of Cantonese and gets really competitive results on common voice datasets of Taiwanese and Japanese. Among them, it is worth mentioning that we achieve a \textbf{10.9\%} character error rate (CER) on the common voice Cantonese ASR task, bringing about \textbf{30\%} relative improvement compared to the wav2vec2 (with fine-tuning).

翻译：我们提出了一种用于低资源自动语音识别任务的统一框架，名为元音频拼接（Meta Audio Concatenation，MAC）。该框架易于实现，并可在极低资源环境下运行。从数学角度，我们通过贝叶斯采样的视角对MAC框架进行了清晰描述。在此框架中，我们利用一种创新的拼接式合成文本转语音系统来增强低资源ASR任务。通过该拼接式合成文本转语音系统，我们可以整合语言发音规则并调整TTS过程。此外，我们提出了一个广义的元音频集概念，以满足使用该系统时不同语言和不同场景的建模需求。大量实验证明了MAC在低资源ASR任务中的显著有效性。在粤语ASR任务、台语ASR任务和日语ASR任务中，针对CTC贪婪搜索、CTC前缀、注意力及注意力重评分解码模式，MAC方法可将字符错误率（CER）降低超过15%。此外，在ASR任务中，MAC在粤语的Common Voice数据集上击败了wav2vec2（经微调），并在台语和日语的Common Voice数据集上取得了极具竞争力的结果。其中，值得一提的是，我们在Common Voice粤语ASR任务上实现了\textbf{10.9\%}的字符错误率（CER），相比wav2vec2（经微调）带来了约\textbf{30\%}的相对提升。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

胰岛素抵抗和Foxo信号对肝纤维化的调控

国家自然科学基金

0+阅读 · 2014年12月31日

飞秒、纳米时/空尺度热输运机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Alpha稳定分布噪声条件下相干循环平稳信号的DOA估计

国家自然科学基金

0+阅读 · 2013年12月31日

基于超辐射机制的太赫兹Smith-Purcell自由电子激光特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂环境下卫星导航干扰抑制及信号分离方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

惯性元件长相关随机漂移多重分数阶建模及滤波研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂环境下导航与通信信号紧耦合测距方法与评估模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

锂空电池钙钛矿型镧锶钴氧分级介孔纳米线电催化性能与机理

国家自然科学基金

0+阅读 · 2012年12月31日

基于矢量传感器的嵌入式大气数据测量技术的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Prokineticin 2 调节SCN神经元的电生理活动及昼夜节律行为

国家自然科学基金

0+阅读 · 2009年12月31日

A Unified Contrastive Transfer Framework with Propagation Structure for Boosting Low-Resource Rumor Detection

Arxiv

0+阅读 · 2023年4月5日

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

Arxiv

0+阅读 · 2023年4月5日

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

Arxiv

0+阅读 · 2023年4月5日

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Arxiv

0+阅读 · 2023年4月4日

UFO2: A unified pre-training framework for online and offline speech recognition

Arxiv

0+阅读 · 2023年4月3日

Semi-supervised Neural Machine Translation with Consistency Regularization for Low-Resource Languages

Arxiv

0+阅读 · 2023年4月2日

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection

Arxiv

0+阅读 · 2023年4月1日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

VIP会员

文章信息

相关主题

自动语音识别

Boosting（一种模型训练加速方式）

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

1+阅读 · 今天15:02

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

1+阅读 · 今天15:00

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

2+阅读 · 今天14:30

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

2+阅读 · 今天14:05

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

2+阅读 · 今天13:55

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

2+阅读 · 今天13:51

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

2+阅读 · 今天13:48

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

7+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

20+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

相关论文

A Unified Contrastive Transfer Framework with Propagation Structure for Boosting Low-Resource Rumor Detection

Arxiv

0+阅读 · 2023年4月5日

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

Arxiv

0+阅读 · 2023年4月5日

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

Arxiv

0+阅读 · 2023年4月5日

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Arxiv

0+阅读 · 2023年4月4日

UFO2: A unified pre-training framework for online and offline speech recognition

Arxiv

0+阅读 · 2023年4月3日

Semi-supervised Neural Machine Translation with Consistency Regularization for Low-Resource Languages

Arxiv

0+阅读 · 2023年4月2日

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection

Arxiv

0+阅读 · 2023年4月1日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

相关基金

胰岛素抵抗和Foxo信号对肝纤维化的调控

国家自然科学基金

0+阅读 · 2014年12月31日

飞秒、纳米时/空尺度热输运机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Alpha稳定分布噪声条件下相干循环平稳信号的DOA估计

国家自然科学基金

0+阅读 · 2013年12月31日

基于超辐射机制的太赫兹Smith-Purcell自由电子激光特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂环境下卫星导航干扰抑制及信号分离方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

惯性元件长相关随机漂移多重分数阶建模及滤波研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂环境下导航与通信信号紧耦合测距方法与评估模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

锂空电池钙钛矿型镧锶钴氧分级介孔纳米线电催化性能与机理

国家自然科学基金

0+阅读 · 2012年12月31日

基于矢量传感器的嵌入式大气数据测量技术的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Prokineticin 2 调节SCN神经元的电生理活动及昼夜节律行为

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员