Interpreto is an open-source Python library for interpreting HuggingFace language models, from early BERT variants to LLMs. It provides two complementary families of methods: attribution methods and concept-based explanations. The library bridges recent research and practical tooling by exposing explanation workflows through a unified API for both classification and text generation. A key differentiator is its end-to-end concept-based pipeline (from activation extraction to concept learning, interpretation, and scoring), which goes beyond feature-level attributions and is uncommon in existing libraries.
翻译:Interpreto是一个用于解释HuggingFace语言模型的开源Python库,涵盖从早期BERT变体到大型语言模型。该库提供两种互补的方法体系:归因方法与基于概念的解释方法。通过为分类和文本生成任务提供统一的API接口来呈现解释工作流,本库在最新研究成果与实用工具之间架起了桥梁。其关键差异化特性在于端到端的基于概念的分析流程(从激活值提取到概念学习、解释与评分),这超越了特征层面的归因分析,在现有工具库中较为罕见。