Exploring Self-Reinforcement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

Learnersourcing involves students generating and sharing learning resources with their peers. When learnersourcing multiple-choice questions, creating explanations for the generated questions is a crucial step as it facilitates a deeper understanding of the related concepts. However, it is often difficult for students to craft effective explanations due to limited subject understanding and a tendency to merely restate the question stem, distractors, and correct answer. To help scaffold this task, in this work we propose a self-reinforcement large-language-model framework, with the goal of generating and evaluating explanations automatically. Comprising three modules, the framework generates student-aligned explanations, evaluates these explanations to ensure their quality and iteratively enhances the explanations. If an explanation's evaluation score falls below a defined threshold, the framework iteratively refines and reassesses the explanation. Importantly, our framework emulates the manner in which students compose explanations at the relevant grade level. For evaluation, we had a human subject-matter expert compare the explanations generated by students with the explanations created by the open-source large language model Vicuna-13B, a version of Vicuna-13B that had been fine-tuned using our method, and by GPT-4. We observed that, when compared to other large language models, GPT-4 exhibited a higher level of creativity in generating explanations. We also found that explanations generated by GPT-4 were ranked higher by the human expert than both those created by the other models and the original student-created explanations. Our findings represent a significant advancement in enriching the learnersourcing experience for students and enhancing the capabilities of large language models in educational applications.

翻译：学习者生成(elearnersourcing)是指学生自主创建学习资源并与同伴共享的过程。在生成多项选择题型学习资源时，为所出题目撰写解释是至关重要的环节，这有助于加深对相关概念的理解。然而，由于学生学科知识有限，且容易仅对题干、干扰项和正确答案进行复述，往往难以构思出有效的解释。为辅助这一任务，本文提出了一种自强化大型语言模型框架，旨在自动生成并评估解释。该框架包含三个模块：生成符合学生认知层次的理解性解释，评估解释质量以确保其达标，并通过迭代方式优化解释。若某条解释的评估分数低于设定阈值，框架将对其进行迭代修改与重新评估。值得注意的是，本框架模拟了对应年级学生撰写解释的思维过程。在评估阶段，我们邀请人类学科专家对比了学生原始解释、基于开源大型语言模型Vicuna-13B生成的解释、经本方法微调的Vicuna-13B版本生成的解释，以及GPT-4生成的解释。研究表明，相较于其他大语言模型，GPT-4在生成解释时展现出更强的创造力。同时，人类专家对GPT-4生成解释的评分均高于其他模型及学生原始解释。本研究在丰富学生学习者生成体验、提升大型语言模型在教育领域的应用效能方面取得了重要突破。