Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data. Such data could include code containing a typical distribution of errors, simulated student behaviour to address the cold start problem when developing education tools, and synthetic user data when access to authentic data is restricted due to privacy reasons. In this research paper, we conduct a comparative study examining the distribution of bugs generated by LLMs in contrast to those produced by computing students. Leveraging data from two previous large-scale analyses of student-generated bugs, we investigate whether LLMs can be coaxed to exhibit bug patterns that are similar to authentic student bugs when prompted to inject errors into code. The results suggest that unguided, LLMs do not generate plausible error distributions, and many of the generated errors are unlikely to be generated by real students. However, with guidance including descriptions of common errors and typical frequencies, LLMs can be shepherded to generate realistic distributions of errors in synthetic code.
翻译:大型语言模型(LLMs)为生成合成课堂数据提供了令人兴奋的机遇。此类数据可包含具有典型错误分布的代码、为解决教育工具开发中冷启动问题而模拟的学生行为,以及因隐私限制无法获取真实数据时生成的合成用户数据。本研究论文通过比较分析,考察了LLMs生成的程序错误与学生生成错误的分布差异。基于先前两项大规模学生错误分析的数据,我们探究了在提示LLMs向代码中注入错误时,能否引导其产生与真实学生错误相似的模式。结果表明,在无引导情况下,LLMs无法生成合理的错误分布,且许多生成错误不太可能由真实学生产生。然而,当提供包含常见错误描述及典型频率的引导信息时,LLMs能够生成合成代码中符合现实情况的错误分布。