This paper systematically investigates the generation of code explanations by Large Language Models (LLMs) for code examples commonly encountered in introductory programming courses. Our findings reveal significant variations in the nature of code explanations produced by LLMs, influenced by factors such as the wording of the prompt, the specific code examples under consideration, the programming language involved, the temperature parameter, and the version of the LLM. However, a consistent pattern emerges for Java and Python, where explanations exhibit a Flesch-Kincaid readability level of approximately 7-8 grade and a consistent lexical density, indicating the proportion of meaningful words relative to the total explanation size. Additionally, the generated explanations consistently achieve high scores for correctness, but lower scores on three other metrics: completeness, conciseness, and specificity.
翻译:本文系统研究了大型语言模型(LLMs)为入门编程课程中常见代码示例生成代码解释的行为。研究发现,LLMs生成的代码解释在性质上存在显著差异,这受到提示词措辞、特定代码示例、编程语言、温度参数以及LLM版本等因素的影响。然而,对于Java和Python语言,存在一种一致的模式:解释的Flesch-Kincaid可读性水平约为7-8年级,并且具有一致的词汇密度(即有意义词汇占总解释篇幅的比例)。此外,生成的解释在正确性指标上始终获得高分,但在其他三个指标(完整性、简洁性和特异性)上得分较低。