How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses

Automated explanatory feedback systems play a crucial role in facilitating learning for a large cohort of learners by offering feedback that incorporates explanations, significantly enhancing the learning process. However, delivering such explanatory feedback in real-time poses challenges, particularly when high classification accuracy for domain-specific, nuanced responses is essential. Our study leverages the capabilities of large language models, specifically Generative Pre-Trained Transformers (GPT), to explore a sequence labeling approach focused on identifying components of desired and less desired praise for providing explanatory feedback within a tutor training dataset. Our aim is to equip tutors with actionable, explanatory feedback during online training lessons. To investigate the potential of GPT models for providing the explanatory feedback, we employed two commonly-used approaches: prompting and fine-tuning. To quantify the quality of highlighted praise components identified by GPT models, we introduced a Modified Intersection over Union (M-IoU) score. Our findings demonstrate that: (1) the M-IoU score effectively correlates with human judgment in evaluating sequence quality; (2) using two-shot prompting on GPT-3.5 resulted in decent performance in recognizing effort-based (M-IoU of 0.46) and outcome-based praise (M-IoU of 0.68); and (3) our optimally fine-tuned GPT-3.5 model achieved M-IoU scores of 0.64 for effort-based praise and 0.84 for outcome-based praise, aligning with the satisfaction levels evaluated by human coders. Our results show promise for using GPT models to provide feedback that focuses on specific elements in their open-ended responses that are desirable or could use improvement.

翻译：自动化解释性反馈系统通过提供包含解释的反馈，在促进大规模学习者群体的学习过程中发挥着关键作用，显著增强了学习效果。然而，实时提供此类解释性反馈面临挑战，尤其是当需要对领域特定且细微的回应实现高分类准确率时。本研究利用大型语言模型——特别是生成式预训练变换模型（GPT）——的能力，探索了一种序列标注方法，该方法聚焦于识别导师培训数据集中表扬性反馈的理想与非理想成分。我们的目标是让导师在在线培训课程中获得可操作的解释性反馈。为探究GPT模型提供解释性反馈的潜力，我们采用了两种常用方法：提示学习和微调。为了量化GPT模型识别出的表扬成分标注质量，我们引入了改进的交并比（M-IoU）分数。研究结果表明：（1）M-IoU分数能有效关联人类对序列质量的判断；（2）使用GPT-3.5的双样本提示学习在识别基于努力的表扬（M-IoU为0.46）和基于结果的表扬（M-IoU为0.68）方面表现出色；（3）我们最优微调的GPT-3.5模型在基于努力的表扬上达到M-IoU为0.64，在基于结果的表扬上达到M-IoU为0.84，与人类编码员评估的满意度水平一致。我们的结果表明，使用GPT模型提供针对开放式回答中特定元素的反馈（无论是理想部分还是可改进部分）具有良好前景。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/