Code-switching, the alternation of languages within a single discourse, presents a significant challenge for Automatic Speech Recognition. Despite the unique nature of the task, performance is commonly measured with established metrics such as Word-Error-Rate (WER). However, in this paper, we question whether these general metrics accurately assess performance on code-switching. Specifically, using both Connectionist-Temporal-Classification and Encoder-Decoder models, we show fine-tuning on non-code-switched data from both matrix and embedded language improves classical metrics on code-switching test sets, although actual code-switched words worsen (as expected). Therefore, we propose Point-of-Interest Error Rate (PIER), a variant of WER that focuses only on specific words of interest. We instantiate PIER on code-switched utterances and show that this more accurately describes the code-switching performance, showing huge room for improvement in future work. This focused evaluation allows for a more precise assessment of model performance, particularly in challenging aspects such as inter-word and intra-word code-switching.
翻译:语码转换(即在同一话语中交替使用不同语言)对自动语音识别构成了重大挑战。尽管该任务具有独特性,但其性能通常使用词错误率等既定指标进行衡量。然而,本文质疑这些通用指标是否能准确评估语码转换任务的性能。具体而言,我们通过使用连接时序分类与编码器-解码器模型进行实验,发现对矩阵语言和嵌入语言的非语码转换数据进行微调,虽然会如预期那样降低语码转换词的处理效果,却能在语码转换测试集上提升传统指标得分。为此,我们提出关注点错误率——一种词错误率的变体,其仅聚焦于特定关键词汇。我们将该指标应用于语码转换话语场景,证明其能更准确地反映语码转换性能,同时揭示未来研究存在巨大改进空间。这种聚焦式评估方法能实现对模型性能的更精准衡量,尤其适用于词间语码转换与词内语码转换等挑战性场景。