NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

Yen-Ting Lin,Chao-Han Huck Yang,Zhehuai Chen,Piotr Zelasko,Xuesong Yang,Zih-Ching Chen,Krishna C Puvvada,Szu-Wei Fu,Ke Hu,Jun Wei Chiu,Jagadeesh Balam,Boris Ginsburg,Yu-Chiang Frank Wang

from arxiv, NeKo work has been done in June 2024. NeKo LMs will be open source on https://huggingface.co/nvidia under the MIT license

Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative $5.0$% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with $15.5$% to $27.6$% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.

翻译：构建通用识别后错误校正器面临一个关键问题：如何在大规模混合领域数据集上最有效地训练模型？答案在于学习数据集特定特征并将其知识消化于单一模型中。先前方法通过使用独立的校正语言模型实现此目标，导致参数量显著增加。在本工作中，我们提出混合专家模型作为解决方案，强调MoE不仅是可扩展性工具。我们提出一种多任务校正MoE，通过训练专家学习将每个数据集的令牌路由至其映射专家，使专家成为语音转文本、语言转文本和视觉转文本数据集的"专家"。在Open ASR排行榜上的实验表明，我们探索了新的最先进性能，在语音和翻译任务中实现了平均相对$5.0$%的词错误率降低以及BLEU分数的显著提升。在零样本评估中，NeKo在Hyporadise基准测试中以$15.5$%至$27.6$%的相对词错误率降低优于GPT-3.5和Claude-Opus。作为多任务模型，NeKo在语法校正和OCR后校正任务中表现出竞争力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日