In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
翻译:近年来,许多工作试图解释深度学习模型的预测结果。然而,仅有少数方法被提出用于验证这些解释的准确性或忠实性。最近,影响函数(一种近似训练中留一法对损失函数影响的方法)被证明具有脆弱性。其脆弱性的推测原因仍不明确。尽管先前的工作建议使用正则化来增强鲁棒性,但这并非在所有情况下都成立。在本工作中,我们试图探究先前工作中的实验,以理解影响函数脆弱性的内在机制。首先,我们在满足影响函数凸性假设的条件下,使用文献中的流程验证影响函数。随后,我们放松这些假设,通过使用更深的模型和更复杂的数据集来研究非凸性的影响。在此过程中,我们分析了用于验证影响函数的关键指标和流程。我们的结果表明,验证流程可能是导致所观察到的脆弱性的原因。