Large language models (LLMs), renowned for their powerful conversational abilities, are widely recognized as exceptional tools in the field of education, particularly in the context of automated intelligent instruction systems for language learning. In this paper, we propose a scoring system based on LLMs, motivated by their positive impact on text-related scoring tasks. Specifically, the speech encoder first maps the learner's speech into contextual features. The adapter layer then transforms these features to align with the text embedding in latent space. The assessment task-specific prefix and prompt text are embedded and concatenated with the features generated by the modality adapter layer, enabling the LLMs to predict accuracy and fluency scores. Our experiments demonstrate that the proposed scoring systems achieve competitive results compared to the baselines on the Speechocean762 datasets. Moreover, we also conducted an ablation study to better understand the contributions of the prompt text and training strategy in the proposed scoring system.
翻译:大语言模型(LLMs)以其强大的对话能力而闻名,被广泛认为是教育领域,特别是语言学习自动化智能教学系统中的卓越工具。本文受LLMs在文本相关评分任务中的积极影响启发,提出了一种基于LLMs的评分系统。具体而言,语音编码器首先将学习者的语音映射为上下文特征。随后,适配器层将这些特征进行转换,以在潜在空间中与文本嵌入对齐。评估任务特定的前缀和提示文本被嵌入,并与模态适配器层生成的特征进行拼接,从而使LLMs能够预测准确度和流利度分数。我们的实验表明,在Speechocean762数据集上,所提出的评分系统相较于基线模型取得了具有竞争力的结果。此外,我们还进行了消融研究,以更好地理解提示文本和训练策略在所提出评分系统中的贡献。