Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data

Can we preserve the accuracy of neural models while also providing faithful explanations of model decisions to training data? We propose a "wrapper box'' pipeline: training a neural model as usual and then using its learned feature representation in classic, interpretable models to perform prediction. Across seven language models of varying sizes, including four large language models (LLMs), two datasets at different scales, three classic models, and four evaluation metrics, we first show that the predictive performance of wrapper classic models is largely comparable to the original neural models. Because classic models are transparent, each model decision is determined by a known set of training examples that can be directly shown to users. Our pipeline thus preserves the predictive performance of neural language models while faithfully attributing classic model decisions to training data. Among other use cases, such attribution enables model decisions to be contested based on responsible training instances. Compared to prior work, our approach achieves higher coverage and correctness in identifying which training data to remove to change a model decision. To reproduce findings, our source code is online at: https://github.com/SamSoup/WrapperBox.

翻译：我们能否在保持神经网络模型准确性的同时，为模型决策提供对训练数据的忠实解释？我们提出一种“包装盒”流程：先按常规训练神经网络模型，然后将其学习到的特征表示应用于经典可解释模型以执行预测。通过对七个不同规模的语言模型（包括四个大语言模型）、两个不同规模的数据集、三种经典模型和四个评估指标的综合实验，我们首先证明包装经典模型的预测性能与原始神经网络模型基本相当。由于经典模型具有透明性，每个模型决策都由一组已知的训练样本直接确定，这些样本可直接展示给用户。因此，我们的流程在保持神经语言模型预测性能的同时，能够将经典模型决策忠实归因于训练数据。这种归因机制使得模型决策可以根据负责任的训练实例提出异议，这仅是众多应用场景之一。与现有研究相比，我们的方法在识别需要移除哪些训练数据以改变模型决策方面，实现了更高的覆盖率和正确率。为复现研究结果，我们的源代码已发布于：https://github.com/SamSoup/WrapperBox。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日