This work explores fine-tuning OpenAI's Whisper automatic speech recognition (ASR) model for Amharic, a low-resource language, to improve transcription accuracy. While the foundational Whisper model struggles with Amharic due to limited representation in its training data, we fine-tune it using datasets like Mozilla Common Voice, FLEURS, and the BDU-speech dataset. The best-performing model, Whispersmall-am, significantly improves when finetuned on a mix of existing FLEURS data and new, unseen Amharic datasets. Training solely on new data leads to poor performance, but combining it with FLEURS data reinforces the model, enabling better specialization in Amharic. We also demonstrate that normalizing Amharic homophones significantly enhances Word Error Rate (WER) and Bilingual Evaluation Understudy (BLEU) scores. This study underscores the importance of fine-tuning strategies and dataset composition for improving ASR in low-resource languages, providing insights for future Amharic speech recognition research.
翻译:本研究探索了针对低资源语言阿姆哈拉语微调OpenAI的Whisper自动语音识别模型,以提升其转录准确率。基础Whisper模型因其训练数据中阿姆哈拉语表征有限而表现不佳,我们使用Mozilla Common Voice、FLEURS及BDU-speech等数据集对其进行微调。性能最佳的模型Whispersmall-am在混合使用现有FLEURS数据与新的未见阿姆哈拉语数据集进行微调后,性能显著提升。仅使用新数据训练会导致模型表现不佳,但将其与FLEURS数据结合则能增强模型能力,使其更好地适应阿姆哈拉语。我们还证明,对阿姆哈拉语同音词进行归一化处理能显著改善词错误率与BLEU分数。本研究强调了微调策略与数据集构成对于改进低资源语言自动语音识别的重要性,为未来阿姆哈拉语语音识别研究提供了重要见解。