Useful conversational agents must accurately capture named entities to minimize error for downstream tasks, for example, asking a voice assistant to play a track from a certain artist, initiating navigation to a specific location, or documenting a laboratory result for a patient. However, where named entities such as ``Ukachukwu`` (Igbo), ``Lakicia`` (Swahili), or ``Ingabire`` (Rwandan) are spoken, automatic speech recognition (ASR) models' performance degrades significantly, propagating errors to downstream systems. We model this problem as a distribution shift and demonstrate that such model bias can be mitigated through multilingual pre-training, intelligent data augmentation strategies to increase the representation of African-named entities, and fine-tuning multilingual ASR models on multiple African accents. The resulting fine-tuned models show an 81.5\% relative WER improvement compared with the baseline on samples with African-named entities.
翻译:以非洲命名实体(例如伊博语中的“Ukachukwu”、斯瓦希里语中的“Lakicia”或卢旺达语的“Ingabire”)为对象的自动语音识别(ASR)模型性能显著下降,并将错误传播至下游系统。我们将此问题建模为一种分布偏移,并证明可通过多语言预训练、智能数据增强策略(增加非洲命名实体的表征量)以及基于多种非洲口音微调多语言ASR模型来缓解这种模型偏差。经微调后的模型在包含非洲命名实体的样本上,相较于基线系统实现了81.5%的相对词错误率(WER)改进。