Training data influence estimation methods quantify the contribution of training documents to a model's output, making them a promising source of information for example-based explanations. As humans cannot interpret thousands of documents, only a small subset of the training data can be presented as an explanation. Although the choice of which documents to include directly affects explanation quality, previous evaluations of such systems have largely ignored any selection strategies. To address this, we propose a novel selection relevance score, a retraining-free metric that quantifies how useful a set of examples is for explaining a model's output. We validate this score through fine-tuning experiments, confirming that it can predict whether a set of examples supports or undermines the model's predictions. Using this metric, we further show that common selection strategies often underperform random selection. Motivated by this finding, we propose a strategy that balances influence and representativeness, enabling better use of selection budgets than naively selecting the highest-ranking examples.
翻译:训练数据影响估计方法能够量化训练文档对模型输出的贡献,使其成为示例解释的理想信息来源。由于人类无法解读数千份文档,解释时仅能呈现训练数据中的小型子集。尽管文档选择策略直接影响解释质量,现有系统评估却普遍忽视选择策略的优化。为此,我们提出一种新颖的选择相关性指标——该免重训练度量方法能够量化示例集对模型输出的解释效用。通过微调实验验证,该指标可准确预测示例集对模型预测的支持或削弱作用。基于此度量方法,我们进一步证明常见选择策略的表现往往逊于随机选择。受此发现启发,我们提出一种兼顾影响力与代表性的选择策略,相较于简单选取最高排名示例,该方法能更有效地利用选择预算。