This paper systematically explores how Large Language Models (LLMs) generate explanations of code examples of the type used in intro-to-programming courses. As we show, the nature of code explanations generated by LLMs varies considerably based on the wording of the prompt, the target code examples being explained, the programming language, the temperature parameter, and the version of the LLM. Nevertheless, they are consistent in two major respects for Java and Python: the readability level, which hovers around 7-8 grade, and lexical density, i.e., the relative size of the meaningful words with respect to the total explanation size. Furthermore, the explanations score very high in correctness but less on three other metrics: completeness, conciseness, and contextualization.
翻译:本文系统性地探讨了大型语言模型(LLMs)如何为编程入门课程中使用的典型代码示例生成解释。研究表明,LLMs生成的代码解释性质会因提示措辞、目标代码示例、编程语言、温度参数及模型版本的不同而产生显著变化。然而,对于Java和Python两种语言,这些解释在两大方面保持了一致性:阅读难度等级(稳定在7-8年级水平)和词汇密度(即有意义的词汇占总解释篇幅的相对比例)。此外,这些解释在正确性指标上得分很高,但在完整性、简洁性和情境化这三个其他指标上表现欠佳。