Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative $5.0$% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with $15.5$% to $27.6$% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
翻译:构建通用识别后错误校正器面临一个关键问题:如何在大规模混合领域数据集上最有效地训练模型?答案在于学习数据集特定特征并将其知识消化于单一模型中。先前方法通过使用独立的校正语言模型实现此目标,导致参数量显著增加。在本工作中,我们提出混合专家模型作为解决方案,强调MoE不仅是可扩展性工具。我们提出一种多任务校正MoE,通过训练专家学习将每个数据集的令牌路由至其映射专家,使专家成为语音转文本、语言转文本和视觉转文本数据集的"专家"。在Open ASR排行榜上的实验表明,我们探索了新的最先进性能,在语音和翻译任务中实现了平均相对$5.0$%的词错误率降低以及BLEU分数的显著提升。在零样本评估中,NeKo在Hyporadise基准测试中以$15.5$%至$27.6$%的相对词错误率降低优于GPT-3.5和Claude-Opus。作为多任务模型,NeKo在语法校正和OCR后校正任务中表现出竞争力。