Large Language Models (LLMs) have showcased exceptional performance across diverse NLP tasks, and their integration with speech encoder is rapidly emerging as a dominant trend in the Automatic Speech Recognition (ASR) field. Previous works mainly concentrated on leveraging LLMs for speech recognition in English and Chinese. However, their potential for addressing speech recognition challenges in low resource settings remains underexplored. Hence, in this work, we aim to explore the capability of LLMs in low resource ASR and Mandarin-English code switching ASR. We also evaluate and compare the recognition performance of LLM-based ASR systems against Whisper model. Extensive experiments demonstrate that LLM-based ASR yields a relative gain of 12.8\% over the Whisper model in low resource ASR while Whisper performs better in Mandarin-English code switching ASR. We hope that this study could shed light on ASR for low resource scenarios.
翻译:大语言模型(LLMs)在多种自然语言处理任务中展现出卓越性能,其与语音编码器的结合正迅速成为自动语音识别(ASR)领域的主流趋势。先前的研究主要集中于利用LLMs进行英语和中文的语音识别。然而,LLMs在应对低资源环境下的语音识别挑战方面的潜力仍未得到充分探索。因此,本研究旨在探索LLMs在低资源ASR及普通话-英语语码转换ASR中的能力。我们同时评估并比较了基于LLM的ASR系统与Whisper模型的识别性能。大量实验表明,在低资源ASR中,基于LLM的ASR相较于Whisper模型取得了12.8%的相对性能提升,而Whisper在普通话-英语语码转换ASR中表现更优。我们希望本研究能为低资源场景下的ASR研究提供启示。