While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable Language Interpretability Analysis), an interactive web application that simplifies the outcomes of various language model component analyses for a broader audience. The system integrates three key techniques -- Attribution Analysis, Function Vector Analysis, and Circuit Tracing -- and introduces a novel methodology: using a vision-language model to automatically generate natural language explanations (NLEs) for the complex visualizations produced by these methods. The effectiveness of this approach was empirically validated through a mixed-methods user study, which revealed a clear preference for interactive, explorable interfaces over simpler, static visualizations. A key finding was that the AI-powered explanations helped bridge the knowledge gap for non-experts; a statistical analysis showed no significant correlation between a user's prior LLM experience and their comprehension scores, suggesting that the system reduced barriers to comprehension across experience levels. We conclude that an AI system can indeed simplify complex model analyses, but its true power is unlocked when paired with thoughtful, user-centered design that prioritizes interactivity, specificity, and narrative guidance.
翻译:尽管机制可解释性领域已开发出强大的工具来分析大语言模型(LLMs)的内部工作机制,但其复杂性造成了可及性鸿沟,使得这些工具仅限于专家使用。为应对这一挑战,我们设计、构建并评估了ELIA(可解释语言可解释性分析系统),这是一个交互式网络应用程序,旨在为更广泛的受众简化各类语言模型组件分析结果的呈现。该系统整合了三种关键技术——归因分析、功能向量分析与电路追踪——并引入了一种新颖方法:利用视觉-语言模型为这些方法生成的复杂可视化结果自动生成自然语言解释(NLEs)。通过一项混合方法的用户研究,我们实证验证了该方法的有效性。研究显示,用户明显更倾向于交互式、可探索的界面,而非简单的静态可视化。一个关键发现是,AI驱动的解释有助于弥合非专家的知识差距;统计分析表明,用户先前的LLM经验与其理解得分之间无显著相关性,这暗示系统降低了不同经验水平用户的认知门槛。我们的结论是,AI系统确实能够简化复杂的模型分析,但其真正潜力在于与以用户为中心的设计相结合,这种设计应优先考虑交互性、具体性和叙事引导。