This paper introduces the first standardized benchmark for evaluating Automatic Speech Recognition (ASR) in the Bambara language, utilizing one hour of professionally recorded Malian constitutional text. Designed as a controlled reference set under near-optimal acoustic and linguistic conditions, the benchmark was used to evaluate 37 models, ranging from Bambara-trained systems to large-scale commercial models. Our findings reveal that current ASR performance remains significantly below deployment standards in a narrow formal domain; the top-performing system in terms of Word Error Rate (WER) achieved 46.76\% and the best Character Error Rate (CER) of 13.00\% was set by another model, while several prominent multilingual models exceeded 100\% WER. These results suggest that multilingual pre-training and model scaling alone are insufficient for underrepresented languages. Furthermore, because this dataset represents a best-case scenario of the most simplified and formal form of spoken Bambara, these figures are yet to be tested against practical, real-world settings. We provide the benchmark and an accompanying public leaderboard to facilitate transparent evaluation and future research in Bambara speech technology.
翻译:本文首次为班巴拉语自动语音识别(ASR)评估建立了标准化基准,该基准采用一小时专业录制的马里宪法文本构建。作为在接近最优声学与语言条件下设计的受控参考集,该基准被用于评估37个模型,涵盖从班巴拉语专项训练系统到大规模商业模型的各类系统。研究发现,当前ASR性能在狭窄的正式领域仍显著低于实际部署标准:在词错误率(WER)指标上表现最佳的系统达到46.76%,而字符错误率(CER)最优成绩13.00%由另一模型取得,同时多个知名多语言模型的WER甚至超过100%。这些结果表明,仅靠多语言预训练和模型扩展不足以支撑资源匮乏语言的识别需求。此外,由于本数据集代表最简化、最正式形式的班巴拉口语的理想场景,这些数据尚未在实际复杂环境中得到验证。我们公开此基准及配套的公共排行榜,以促进班巴拉语音技术的透明评估与未来研究。