Training data influence estimation methods quantify the contribution of training documents to a model's output, making them a promising source of information for example-based explanations. As humans cannot interpret thousands of documents, only a small subset of the training data can be presented as an explanation. Although the choice of which documents to include directly affects explanation quality, previous evaluations of such systems have largely ignored any selection strategies. To address this, we propose a novel selection relevance score, a retraining-free metric that quantifies how useful a set of examples is for explaining a model's output. We validate this score through fine-tuning experiments, confirming that it can predict whether a set of examples supports or undermines the model's predictions. Using this metric, we further show that common selection strategies often underperform random selection. Motivated by this finding, we propose a strategy that balances influence and representativeness, enabling better use of selection budgets than naively selecting the highest-ranking examples.
翻译:训练数据影响力估计方法量化了训练文档对模型输出的贡献,使其成为基于示例解释的有前景的信息来源。由于人类无法解读数千份文档,因此只能呈现训练数据中的一小部分子集作为解释。尽管选择哪些文档会直接影响解释质量,但以往对此类系统的评估大多忽略了选择策略。为解决这一问题,我们提出了一种新型的选择相关性分数——一种无需重新训练、可量化示例集对解释模型输出有用程度的评估指标。我们通过微调实验验证了该分数,确认它能预测示例集是支持还是削弱模型的预测结果。利用该指标,我们进一步发现常见的选择策略往往不如随机选择。基于此发现,我们提出了一种平衡影响力与代表性的策略,相比简单选取排名最高的示例,该策略能更有效地利用选择预算。