Spoken language assessment (SLA) systems restrict themselves to evaluating the pronunciation and oral fluency of a speaker by analysing the read and spontaneous spoken utterances respectively. The assessment of language grammar or vocabulary is relegated to written language assessment (WLA) systems. Most WLA systems present a set of sentences from a curated finite-size database of sentences thereby making it possible to anticipate the test questions and train oneself. In this paper, we propose a novel end-to-end SLA system to assess language grammar from spoken utterances thus making WLA systems redundant; additionally, we make the assessment largely unteachable by employing a large language model (LLM) to bring in variations in the test. We further demonstrate that a hybrid automatic speech recognition (ASR) with a custom-built language model outperforms the state-of-the-art ASR engine for spoken grammar assessment.
翻译:口语评估系统通常仅通过分析朗读和即兴口语表达,分别评估说话者的发音和口语流利度。语言语法或词汇的评估则被归入书面语言评估系统。大多数书面语言评估系统从经过筛选的有限规模句子数据库中选取句子进行测试,这使得应试者可能预测测试题目并进行针对性训练。本文提出一种新颖的端到端口语评估系统,通过口语表达直接评估语言语法,从而使书面语言评估系统失去必要性;此外,我们采用大语言模型引入测试内容的多样性,使评估过程在很大程度上无法通过机械训练应对。我们进一步证明,结合定制语言模型的混合自动语音识别系统在口语语法评估任务上优于当前最先进的自动语音识别引擎。