When facing an unsatisfactory prediction from a machine learning model, users can be interested in investigating the underlying reasons and exploring the potential for reversing the outcome. We ask: To flip the prediction on a test point $x_t$, how to identify the smallest training subset $\mathcal{S}_t$ that we need to relabel? We propose an efficient algorithm to identify and relabel such a subset via an extended influence function for binary classification models with convex loss. We find that relabeling fewer than 2% of the training points can always flip a prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by altering training points; (2) evaluating model robustness with the cardinality of the subset (i.e., $|\mathcal{S}_t|$); we show that $|\mathcal{S}_t|$ is highly related to the noise ratio in the training set and $|\mathcal{S}_t|$ is correlated with but complementary to predicted probabilities; and (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.
翻译:当机器学习模型产生不满意的预测时,用户可能希望探究其根本原因并寻找逆转结果的可能性。我们提出一个问题:为翻转测试点 $x_t$ 上的预测,如何识别需要重标注的最小训练子集 $\mathcal{S}_t$?我们提出一种高效算法,通过扩展凸损失二分类模型的influence function来识别并重标注此类子集。研究发现,重标注少于2%的训练点即可始终翻转预测。该机制可用于多种目的:(1)提供一种通过修改训练点来挑战模型预测的方法;(2)利用子集的基数(即 $|\mathcal{S}_t|$)评估模型鲁棒性——我们证明 $|\mathcal{S}_t|$ 与训练集中的噪声比例高度相关,且与预测概率呈相关但互补的关系;(3)揭示导致群体归因偏差的训练点。据我们所知,这是首个针对识别并重标注翻转给定预测所需最小训练子集的研究。