When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: To flip the prediction on a test point $x_t$, how to identify the smallest training subset $\mathcal{S}_t$ we need to relabel? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 2% of the training points can always flip a prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by altering training points; (2) evaluating model robustness with the cardinality of the subset (i.e., $|\mathcal{S}_t|$); we show that $|\mathcal{S}_t|$ is highly related to the noise ratio in the training set and $|\mathcal{S}_t|$ is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.
翻译:当机器学习模型给出不满意的预测时,探究其根本原因并探索逆转结果的潜力至关重要。我们提出一个问题:为了翻转测试点$x_t$上的预测,如何确定需要重新标注的最小训练子集$\mathcal{S}_t$?我们提出了一种高效的方法,通过扩展影响函数来识别并重新标注这样的子集。我们发现,重新标注少于2%的训练点总能翻转预测结果。这一机制具有多种用途:(1) 提供一种通过修改训练点来挑战模型预测的方法;(2) 利用子集的基数(即$|\mathcal{S}_t|$)评估模型鲁棒性;我们表明$|\mathcal{S}_t|$与训练集中的噪声比高度相关,且与预测概率相关但互补;(3) 揭示导致群体归因偏差的训练点。据我们所知,我们是首个研究识别并重新标注翻转给定预测所需最小训练子集的工作。