Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization

Recent language models have demonstrated proficiency in summarizing source code. However, as in many other domains of machine learning, language models of code lack sufficient explainability. Informally, we lack a formulaic or intuitive understanding of what and how models learn from code. Explainability of language models can be partially provided if, as the models learn to produce higher-quality code summaries, they also align in deeming the same code parts important as those identified by human programmers. In this paper, we report negative results from our investigation of explainability of language models in code summarization through the lens of human comprehension. We measure human focus on code using eye-tracking metrics such as fixation counts and duration in code summarization tasks. To approximate language model focus, we employ a state-of-the-art model-agnostic, black-box, perturbation-based approach, SHAP (SHapley Additive exPlanations), to identify which code tokens influence that generation of summaries. Using these settings, we find no statistically significant relationship between language models' focus and human programmers' attention. Furthermore, alignment between model and human foci in this setting does not seem to dictate the quality of the LLM-generated summaries. Our study highlights an inability to align human focus with SHAP-based model focus measures. This result calls for future investigation of multiple open questions for explainable language models for code summarization and software engineering tasks in general, including the training mechanisms of language models for code, whether there is an alignment between human and model attention on code, whether human attention can improve the development of language models, and what other model focus measures are appropriate for improving explainability.

翻译：近期的语言模型在源代码摘要生成中展现出卓越能力。然而，与机器学习领域的许多其他方向类似，代码语言模型缺乏充分的可解释性。通俗而言，我们尚未形成对模型如何从代码中学习、学习内容的形式化或直觉性理解。若模型在提升代码摘要质量的过程中，其判定的关键代码片段与人类程序员标注的重合度有所提高，则可部分实现语言模型的可解释性。本文通过人类理解的视角，报告了针对代码摘要中语言模型可解释性的负面研究结果。我们采用注视次数、注视时长等眼动追踪指标，测量人类在代码摘要任务中的注意力分布。为近似语言模型的注意力焦点，我们采用当前最先进的与模型无关的黑盒扰动方法SHAP（沙普利加性解释），识别影响摘要生成的关键代码标记。实验结果表明，在此设定下，语言模型的注意力焦点与人类程序员的关注点之间不存在统计学显著相关性。此外，模型与人类注意力的对齐程度似乎与生成的摘要质量无关。本研究揭示了将人类注意力与基于SHAP的模型注意力测量进行对齐的不可行性。这一结果引发了多个待解问题的未来探索需求，包括：代码语言模型的训练机制、人类与模型对代码的注意力是否存在对齐、人类注意力能否促进语言模型开发，以及哪些其他模型关注度指标适用于提升可解释性——这些问题对代码摘要及软件工程任务中可解释语言模型的构建具有普遍意义。