Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations

Trustworthiness and interpretability are inextricably linked concepts for LLMs. The more interpretable an LLM is, the more trustworthy it becomes. However, current techniques for interpreting LLMs when applied to code-related tasks largely focus on accuracy measurements, measures of how models react to change, or individual task performance instead of the fine-grained explanations needed at prediction time for greater interpretability, and hence trust. To improve upon this status quo, this paper introduces ASTrust, an interpretability method for LLMs of code that generates explanations grounded in the relationship between model confidence and syntactic structures of programming languages. ASTrust explains generated code in the context of syntax categories based on Abstract Syntax Trees and aids practitioners in understanding model predictions at both local (individual code snippets) and global (larger datasets of code) levels. By distributing and assigning model confidence scores to well-known syntactic structures that exist within ASTs, our approach moves beyond prior techniques that perform token-level confidence mapping by offering a view of model confidence that directly aligns with programming language concepts with which developers are familiar. To put ASTrust into practice, we developed an automated visualization that illustrates the aggregated model confidence scores superimposed on sequence, heat-map, and graph-based visuals of syntactic structures from ASTs. We examine both the practical benefit that ASTrust can provide through a data science study on 12 popular LLMs on a curated set of GitHub repos and the usefulness of ASTrust through a human study.

翻译：可信度与可解释性是大语言模型中密不可分的两个概念。大语言模型的可解释性越强，其可信度就越高。然而，当前应用于代码相关任务的大语言模型解释技术主要聚焦于准确性度量、模型对变化的反应测量或单项任务性能，而非预测时所需的细粒度解释，而后者正是提升可解释性从而增强可信度的关键。为改善这一现状，本文提出ASTrust——一种面向代码大语言模型的可解释性方法，该方法基于模型置信度与编程语言语法结构之间的关系生成解释。ASTrust依托抽象语法树，在语法类别语境下解释生成的代码，帮助从业者在局部（独立代码片段）和全局（大型代码数据集）两个层面理解模型预测。通过将模型置信度分数分配并关联至抽象语法树中已知的语法结构，我们的方法超越了以往仅进行词元级置信度映射的技术，提供了一种与开发者熟悉的编程语言概念直接对齐的模型置信度视角。为将ASTrust投入实践，我们开发了自动化可视化工具，将聚合的模型置信度分数叠加在基于序列、热力图和图形的抽象语法树语法结构可视化界面上。我们通过两项研究验证ASTrust的价值：一是对12个主流大语言模型在精选GitHub仓库数据集上的数据科学研究，二是通过人工实验评估ASTrust的实际效用。