Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

As Large Language Models for Code (LM4Code) become integral to software engineering, establishing trust in their output becomes critical. However, standard accuracy metrics obscure the underlying reasoning of generative models, offering little insight into how decisions are made. Although post-hoc interpretability methods attempt to fill this gap, they often restrict explanations to local, token-level insights, which fail to provide a developer-understandable global analysis. Our work highlights the urgent need for \textbf{global, code-based} explanations that reveal how models reason across code. To support this vision, we introduce \textit{code rationales} (CodeQ), a framework that enables global interpretability by mapping token-level rationales to high-level programming categories. Aggregating thousands of these token-level explanations allows us to perform statistical analyses that expose systemic reasoning behaviors. We validate this aggregation by showing it distills a clear signal from noisy token data, reducing explanation uncertainty (Shannon entropy) by over 50%. Additionally, we find that a code generation model (\textit{codeparrot-small}) consistently favors shallow syntactic cues (e.g., \textbf{indentation}) over deeper semantic logic. Furthermore, in a user study with 37 participants, we find its reasoning is significantly misaligned with that of human developers. These findings, hidden from traditional metrics, demonstrate the importance of global interpretability techniques to foster trust in LM4Code.

翻译：随着代码大语言模型（LM4Code）日益成为软件工程不可或缺的组成部分，建立对其输出的信任变得至关重要。然而，传统的准确性指标掩盖了生成模型的底层推理过程，几乎无法揭示其决策机制。尽管事后可解释性方法试图填补这一空白，但它们通常将解释局限于局部、词元层面的分析，难以提供开发者可理解的全局视角。本研究强调了对能够揭示模型在代码层面推理方式的**全局、基于代码的**解释的迫切需求。为支持这一愿景，我们提出了**代码理据**（CodeQ）框架，该框架通过将词元层面的理据映射到高级编程类别，实现了全局可解释性。通过聚合数千个此类词元层面的解释，我们得以进行统计分析，从而揭示系统性的推理行为。我们验证了这种聚合方法的有效性，证明其能够从嘈杂的词元数据中提取出清晰的信号，将解释的不确定性（香农熵）降低超过50%。此外，我们发现一个代码生成模型（*codeparrot-small*）持续倾向于依赖浅层的语法线索（例如**缩进**），而非更深层的语义逻辑。进一步地，在一项涉及37名参与者的用户研究中，我们发现该模型的推理与人类开发者的推理存在显著偏差。这些被传统指标所掩盖的发现，凸显了全局可解释性技术对于促进对LM4Code信任的重要性。