Trustworthy Actionable Perturbations

Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a counterfactual changes a classifier's decision, however, it may not affect the true underlying class probabilities, i.e. the counterfactual may act like an adversarial attack and ``fool'' the classifier. We propose a new framework for creating modified inputs that change the true underlying probabilities in a beneficial way which we call Trustworthy Actionable Perturbations (TAP). This includes a novel verification procedure to ensure that TAP change the true class probabilities instead of acting adversarially. Our framework also includes new cost, reward, and goal definitions that are better suited to effectuating change in the real world. We present PAC-learnability results for our verification procedure and theoretically analyze our new method for measuring reward. We also develop a methodology for creating TAP and compare our results to those achieved by previous counterfactual methods.

翻译：反事实解释（即导致不同分类结果的修改后输入）是理解机器学习分类器逻辑以及如何改变不良分类的重要工具。然而，即便反事实解释能够改变分类器的决策，它也可能无法影响真实的底层类别概率——换言之，反事实解释可能如同对抗攻击一般"欺骗"分类器。为此，我们提出一种名为"值得信赖的可操作扰动"（Trustworthy Actionable Perturbations, TAP）的新框架，用于生成能以有益方式改变真实底层概率的修改后输入。该框架包含一种新颖的验证程序，可确保TAP改变真实类别概率而非产生对抗性作用。我们的框架还定义了更适用于现实世界改变的新成本、奖励和目标函数。我们给出了验证过程的PAC可学习性结果，并从理论上分析了新的奖励度量方法。同时，我们开发了生成TAP的具体方法，并将结果与现有反事实方法进行了对比。

相关内容

TAP

关注 819

ACM应用感知TAP(ACM Transactions on Applied Perception)旨在通过发表有助于统一这些领域研究的高质量论文来增强计算机科学与心理学/感知之间的协同作用。该期刊发表跨学科研究，在跨计算机科学和感知心理学的任何主题领域都具有重大而持久的价值。所有论文都必须包含感知和计算机科学两个部分。主题包括但不限于：视觉感知：计算机图形学，科学/数据/信息可视化，数字成像，计算机视觉，立体和3D显示技术。听觉感知：听觉显示和界面，听觉听觉编码，空间声音，语音合成和识别。触觉：触觉渲染，触觉输入和感知。感觉运动知觉：手势输入，身体运动输入。感官感知：感官整合，多模式渲染和交互。官网地址：http://dblp.uni-trier.de/db/journals/tap/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日