This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. The system will be made freely available for public use, and represents an important resource to researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.
翻译:本文介绍了首个基于网络的爱尔兰语转录系统——Fotheidil。该系统作为ABAIR倡议的一部分,集成了语音相关的人工智能技术。它既包含现成的预训练语音活动检测与说话人日志模型,也包含专门为爱尔兰语自动语音识别以及大小写与标点恢复而训练的模型。研究探索了半监督学习用于改进模块化TDNN-HMM ASR系统的声学模型,对于监督训练集中代表性不足的领域外测试集和方言,取得了显著性能提升。此外,研究将一种涉及序列到序列模型的大小写与标点恢复新方法,与使用分类模型的传统方法进行了比较。实验结果表明,新方法在性能上同样带来了显著改进。该系统将免费向公众开放,为研究爱尔兰语材料的研究人员及其他转录者提供了重要资源。随着系统的使用,人工校正的转录文本将被收集并纳入训练数据集,从而有望以循环、社区驱动的方式,逐步改进ASR模型。