面向低资源真实场景语音的语音识别技术进展 (Advancing STT for Low-Resource Real-World Speech)

Swiss German is a low-resource language represented by diverse dialects that differ significantly from Standard German and from each other, lacking a standardized written form. As a result, transcribing Swiss German involves translating into Standard German. Existing datasets have been collected in controlled environments, yielding effective speech-to-text (STT) models, but these models struggle with spontaneous conversational speech. This paper, therefore, introduces the new SRB-300 dataset, a 300-hour annotated speech corpus featuring real-world long-audio recordings from 39 Swiss German radio and TV stations. It captures spontaneous speech across all major Swiss dialects recorded in various realistic environments and overcomes the limitation of prior sentence-level corpora. We fine-tuned multiple OpenAI Whisper models on the SRB-300 dataset, achieving notable enhancements over previous zero-shot performance metrics. Improvements in word error rate (WER) ranged from 19% to 33%, while BLEU scores increased between 8% and 40%. The best fine-tuned model, large-v3, achieved a WER of 17.1% and a BLEU score of 74.8. This advancement is crucial for developing effective and robust STT systems for Swiss German and other low-resource languages in real-world contexts.

翻译：瑞士德语是一种低资源语言，由多种方言构成，这些方言与标准德语及彼此之间存在显著差异，且缺乏标准化的书面形式。因此，转录瑞士德语通常涉及将其翻译为标准德语。现有数据集均在受控环境下采集，并已构建出有效的语音转文本模型，但这些模型在处理自发对话语音时表现欠佳。为此，本文引入了全新的SRB-300数据集——一个包含300小时标注语音的语料库，收录了来自39个瑞士德语广播及电视台的真实场景长音频录音。该数据集涵盖了所有主要瑞士方言在各种真实环境下的自发语音，突破了以往句子级语料库的局限性。我们在SRB-300数据集上对多个OpenAI Whisper模型进行了微调，相比先前零样本性能指标取得了显著提升：词错误率降低了19%至33%，BLEU分数提高了8%至40%。其中最优的微调模型large-v3实现了17.1%的词错误率和74.8的BLEU分数。这一进展对于开发适用于瑞士德语及其他低资源语言在真实场景中高效鲁棒的语音识别系统具有重要意义。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日