We consider the problem of identifying a minimal subset of training data $\mathcal{S}_t$ such that if the instances comprising $\mathcal{S}_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different. Identifying such a set may be of interest for a few reasons. First, the cardinality of $\mathcal{S}_t$ provides a measure of robustness (if $|\mathcal{S}_t|$ is small for $x_t$, we might be less confident in the corresponding prediction), which we show is correlated with but complementary to predicted probabilities. Second, interrogation of $\mathcal{S}_t$ may provide a novel mechanism for contesting a particular model prediction: If one can make the case that the points in $\mathcal{S}_t$ are wrongly labeled or irrelevant, this may argue for overturning the associated prediction. Identifying $\mathcal{S}_t$ via brute-force is intractable. We propose comparatively fast approximation methods to find $\mathcal{S}_t$ based on influence functions, and find that -- for simple convex text classification models -- these approaches can often successfully identify relatively small sets of training examples which, if removed, would flip the prediction.
翻译:我们考虑识别训练数据$\mathcal{S}_t$的最小替换子集的问题,使得如果构成$\mathcal{S}_t$的实例在训练前被移除,则给定测试点$x_t$的分类结果将有所不同。识别这样的集合可能出于几个原因。首先,$\mathcal{S}_t$的基数提供了鲁棒性的度量(如果对于$x_t$,$|\mathcal{S}_t|$很小,我们可能对相应的预测不太有信心),我们表明这与预测概率相关但互补。其次,对$\mathcal{S}_t$的审问可能提供一种挑战特定模型预测的新机制:如果有人能够证明$\mathcal{S}_t$中的点被错误标记或无关,这可能为推翻相关预测提供依据。通过暴力方法识别$\mathcal{S}_t$是难以处理的。我们提出基于影响函数的相对快速的近似方法来寻找$\mathcal{S}_t$,并发现——对于简单的凸文本分类模型——这些方法通常能够成功识别相对较小的训练示例子集,移除这些示例将翻转预测。