Code-switching, the alternation of languages within a single discourse, presents a significant challenge for Automatic Speech Recognition. Despite the unique nature of the task, performance is commonly measured with established metrics such as Word-Error-Rate (WER). However, in this paper, we question whether these general metrics accurately assess performance on code-switching. Specifically, using both Connectionist-Temporal-Classification and Encoder-Decoder models, we show fine-tuning on non-code-switched data from both matrix and embedded language improves classical metrics on code-switching test sets, although actual code-switched words worsen (as expected). Therefore, we propose Point-of-Interest Error Rate (PIER), a variant of WER that focuses only on specific words of interest. We instantiate PIER on code-switched utterances and show that this more accurately describes the code-switching performance, showing huge room for improvement in future work. This focused evaluation allows for a more precise assessment of model performance, particularly in challenging aspects such as inter-word and intra-word code-switching.
翻译:语码转换(即在同一话语中交替使用不同语言)对自动语音识别构成了重大挑战。尽管该任务具有独特性,但其性能通常使用词错误率等既定指标进行衡量。然而,本文质疑这些通用指标是否能准确评估语码转换任务的性能。具体而言,通过使用连接时序分类与编码器-解码器模型,我们证明在非语码转换数据(包括基质语言和嵌入语言)上进行微调,虽然实际语码转换词汇的性能会如预期般下降,但能在语码转换测试集上提升传统指标得分。因此,我们提出关注点错误率——一种词错误率的变体,仅聚焦于特定关注词汇。我们将该度量应用于语码转换话语,证明其能更准确地描述语码转换性能,同时揭示未来研究存在巨大改进空间。这种聚焦式评估方法能够更精确地衡量模型性能,特别是在词间语码转换与词内语码转换等挑战性维度上。