When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: can we result in the flipping of a test prediction $x_t$ by relabeling the smallest subset $\mathcal{S}_t$ of the training data before the model is trained? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 1% of the training points can often flip the model's prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by recovering influential training subsets; (2) evaluating model robustness with the cardinality of the subset (i.e., $|\mathcal{S}_t|$); we show that $|\mathcal{S}_t|$ is highly related to the noise ratio in the training set and $|\mathcal{S}_t|$ is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.
翻译:当机器学习模型的预测结果不尽如人意时,探究其根本原因并寻找逆转结果的可能性至关重要。我们提出一个问题:能否通过在模型训练前重标注最小训练子集 $\mathcal{S}_t$,从而翻转测试样本 $x_t$ 的预测结果?为此,我们提出了一种高效的方法,通过扩展影响函数来识别并重标注该子集。实验表明,重标注训练集中不到 1% 的数据点即可翻转模型的预测。该机制具有多种用途:(1)提供一种方法,通过恢复影响较大的训练子集来挑战模型的预测结果;(2)利用子集的基数(即 $|\mathcal{S}_t|$)评估模型鲁棒性;我们发现 $|\mathcal{S}_t|$ 与训练集中的噪声比例高度相关,且与预测概率相关但具有互补性;(3)揭示导致群体归因偏差的训练样本。据我们所知,本文首次系统研究了识别并重标注翻转给定预测所需最小训练子集的问题。