Data augmentation can mitigate limited training data in machine-learning automated scoring engines for constructed response items. This study seeks to determine how well three approaches to large language model prompting produce essays that preserve the writing quality of the original essays and produce realistic text for augmenting ASE training datasets. We created simulated versions of student essays, and human raters assigned scores to them and rated the realism of the generated text. The results of the study indicate that the predict next prompting strategy produces the highest level of agreement between human raters regarding simulated essay scores, predict next and sentence strategies best preserve the rated quality of the original essay in the simulated essays, and predict next and 25 examples strategies produce the most realistic text as judged by human raters.
翻译:数据增强可以缓解构建式反应项目机器学习自动评分引擎中训练数据有限的问题。本研究旨在探究三种大型语言模型提示方法在生成论文时的表现,这些方法需保持原始论文的写作质量并生成真实文本以增强自动评分引擎训练数据集。我们创建了学生论文的模拟版本,由人工评分员对其进行评分并评估生成文本的真实性。研究结果表明:在模拟论文评分方面,预测下一个提示策略在人工评分员间达成最高一致性;预测下一个和句子策略最能保持原始论文在模拟版本中的评分质量;而根据人工评分员的判断,预测下一个和25个示例策略生成的文本最为真实。