As Large Language Models (LLMs) scale to handle massive context windows, achieving surgical feature-level interpretation is essential for high-stakes tasks like legal auditing and code debugging. However, existing local model-agnostic explanation methods face a critical dilemma in these scenarios: feature-based methods suffer from attribution dilution due to high feature dimensionality, thus failing to provide faithful explanations. In this paper, we propose Focus-LIME, a coarse-to-fine framework designed to restore the tractability of surgical interpretation. Focus-LIME utilizes a proxy model to curate the perturbation neighborhood, allowing the target model to perform fine-grained attribution exclusively within the optimized context. Empirical evaluations on long-context benchmarks demonstrate that our method makes surgical explanations practicable and provides faithful explanations to users.
翻译:随着大语言模型(LLM)扩展至处理海量上下文窗口,在诸如法律审计与代码调试等高风险任务中,实现精准的特征级解释至关重要。然而,现有的局部模型无关解释方法在此类场景下面临一个关键困境:基于特征的方法因特征维度高而遭受归因稀释,从而无法提供可信的解释。本文提出Focus-LIME,一种从粗到精的框架,旨在恢复精准解释的可处理性。Focus-LIME利用代理模型来筛选扰动邻域,使目标模型得以仅在优化后的上下文中执行细粒度归因。在长上下文基准测试上的实证评估表明,我们的方法使精准解释变得可行,并为用户提供了可信的解释。