On Explaining (Large) Language Models For Code Using Global Code-Based Explanations

In recent years, Language Models for Code (LLM4Code) have significantly changed the landscape of software engineering (SE) on downstream tasks, such as code generation, by making software development more efficient. Therefore, a growing interest has emerged in further evaluating these Language Models to homogenize the quality assessment of generated code. As the current evaluation process can significantly overreact on accuracy-based metrics, practitioners often seek methods to interpret LLM4Code outputs beyond canonical benchmarks. While the majority of research reports on code generation effectiveness in terms of expected ground truth, scant attention has been paid to LLMs' explanations. In essence, the decision-making process to generate code is hard to interpret. To bridge this evaluation gap, we introduce code rationales (Code$Q$), a technique with rigorous mathematical underpinning, to identify subsets of tokens that can explain individual code predictions. We conducted a thorough Exploratory Analysis to demonstrate the method's applicability and a User Study to understand the usability of code-based explanations. Our evaluation demonstrates that Code$Q$ is a powerful interpretability method to explain how (less) meaningful input concepts (i.e., natural language particle `at') highly impact output generation. Moreover, participants of this study highlighted Code$Q$'s ability to show a causal relationship between the input and output of the model with readable and informative explanations on code completion and test generation tasks. Additionally, Code$Q$ also helps to uncover model rationale, facilitating comparison with a human rationale to promote a fair level of trust and distrust in the model.

翻译：近年来，代码语言模型（LLM4Code）通过提升软件开发效率，显著改变了代码生成等下游任务的软件工程格局。因此，学界日益关注如何进一步评估这些语言模型，以统一生成代码的质量评价标准。由于当前评估过程可能过度依赖基于准确率的指标，实践者常寻求超越传统基准的LLM4Code输出解释方法。尽管多数研究通过期望真实值来报告代码生成效果，但对LLM解释机制的研究尚显不足。本质上，代码生成的决策过程难以解释。为弥合这一评估鸿沟，我们提出具有严格数学基础的代码归因（Code$Q$）技术，该方法可识别能够解释个体代码预测结果的令牌子集。我们通过详尽的探索性分析验证了该方法的适用性，并通过用户研究评估了基于代码的解释的可用性。实验结果表明，Code$Q$是一种强大的可解释性方法，能够揭示（较）无意义的输入概念（如自然语言虚词“at”）如何显著影响输出生成。此外，研究参与者强调Code$Q$能够通过代码补全和测试生成任务中可读且信息丰富的解释，展示模型输入与输出之间的因果关系。Code$Q$还有助于揭示模型的内在逻辑，通过与人类逻辑的对比促进对模型建立合理程度的信任与质疑。