This paper analyzes Large Language Models (LLMs) with regard to their programming exercise generation capabilities. Through a survey study, we defined the state of the art, extracted their strengths and weaknesses and finally proposed an evaluation matrix, helping researchers and educators to decide which LLM is the best fitting for the programming exercise generation use case. We also found that multiple LLMs are capable of producing useful programming exercises. Nevertheless, there exist challenges like the ease with which LLMs might solve exercises generated by LLMs. This paper contributes to the ongoing discourse on the integration of LLMs in education.
翻译:本文分析了大型语言模型(LLMs)在编程习题生成方面的能力。通过一项调查研究,我们界定了该领域的技术现状,总结了其优势与不足,并最终提出了一套评估矩阵,以帮助研究者和教育工作者判断哪种LLM最适合编程习题生成的应用场景。我们还发现,多种LLM均能生成具有实用价值的编程习题。然而,该领域仍存在诸多挑战,例如LLM可能轻易解答出由LLM生成的习题。本文为当前关于LLM在教育领域融合应用的讨论提供了学术参考。