Recently, influence functions present an apparatus for achieving explainability for deep neural models by quantifying the perturbation of individual train instances that might impact a test prediction. Our objectives in this paper are twofold. First we incorporate influence functions as a feedback into the model to improve its performance. Second, in a dataset extension exercise, using influence functions to automatically identify data points that have been initially `silver' annotated by some existing method and need to be cross-checked (and corrected) by annotators to improve the model performance. To meet these objectives, in this paper, we introduce InfFeed, which uses influence functions to compute the influential instances for a target instance. Toward the first objective, we adjust the label of the target instance based on its influencer(s) label. In doing this, InfFeed outperforms the state-of-the-art baselines (including LLMs) by a maximum macro F1-score margin of almost 4% for hate speech classification, 3.5% for stance classification, and 3% for irony and 2% for sarcasm detection. Toward the second objective we show that manually re-annotating only those silver annotated data points in the extension set that have a negative influence can immensely improve the model performance bringing it very close to the scenario where all the data points in the extension set have gold labels. This allows for huge reduction of the number of data points that need to be manually annotated since out of the silver annotated extension dataset, the influence function scheme picks up ~1/1000 points that need manual correction.
翻译:摘要:近期,影响函数通过量化单个训练实例对测试预测的扰动效应,为深度神经模型提供了可解释性工具。本文目标有二:其一,将影响函数作为反馈机制融入模型以提升性能;其二,在数据集扩展任务中,利用影响函数自动识别由现有方法初始标注为"银标准"的数据点,这些数据需经标注人员复核(并修正)以提升模型性能。为实现上述目标,本文提出InfFeed框架,通过影响函数计算目标实例的关键影响实例。针对第一项目标,我们依据影响者标签调整目标实例标签。实验表明,InfFeed在仇恨言论分类(宏F1分数最高提升近4%)、立场分类(3.5%)、反讽检测(3%)及讽刺检测(2%)任务上全面超越现有最优基线(含大语言模型)。针对第二项目标,我们证明仅需手动重新标注扩展集中具有负面影响效应的银标准数据点,即可显著提升模型性能,使其逼近全部扩展数据均具有金标准标签的理想场景。该机制使得需要手动标注的数据量大幅缩减——在银标准扩展数据集中,影响函数方案仅需筛选约千分之一的数据点进行人工修正。