Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech recognition (ASR) under high-resource conditions. However, existing benchmarks predominantly focus on high-resource languages, leaving the ASR behavior of SpeechLMs in low-resource languages insufficiently understood. This gap is critical, as practical ASR systems must reliably support low-resource languages and generalize across diverse language families, and it directly hinders the deployment of SpeechLM-based ASR in real-world multilingual scenarios. As a result, it is essential to evaluate SpeechLMs on low-resource languages to ensure their generalizability across different language families. To address this problem, we propose \textbf{LoASR-Bench}, a comprehensive benchmark designed to evaluate \textbf{lo}w-resource \textbf{a}utomatic \textbf{s}peech \textbf{r}ecognition (\textbf{ASR}) of the latest SpeechLMs across diverse language families. LoASR-Bench comprises 25 languages from 9 language families, featuring both Latin and non-Latin scripts, enabling cross-linguistic and cross-script assessment of ASR performance of current SpeechLMs. Experimental results highlight the limitations of the latest SpeechLMs in handling real-world low-resource languages.
翻译:大语言模型(LLMs)推动了语音语言模型(SpeechLMs)的重大进展,使其在高资源条件下的自动语音识别(ASR)中表现优异。然而,现有基准测试主要聚焦高资源语言,导致对SpeechLMs在低资源语言中ASR行为的理解不足。这一差距至关重要——实际ASR系统必须可靠支持低资源语言并泛化至不同语系,而当前研究空白直接阻碍了基于SpeechLM的ASR在多语言真实场景中的部署。因此,亟需评估SpeechLMs在低资源语言上的表现以确保其跨语系泛化能力。为解决此问题,我们提出\textbf{LoASR-Bench}——一个全面基准,旨在评估最新SpeechLMs在\textbf{低资源自动语音识别}(ASR)中跨不同语系的表现。LoASR-Bench涵盖9个语系的25种语言(含拉丁字母与非拉丁字母文字系统),支持对当前SpeechLMs的ASR性能进行跨语言与跨文字系统评估。实验结果表明,最新SpeechLMs在处理真实低资源语言时存在显著局限性。