In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
翻译:近年来,许多研究尝试解释深度学习模型的预测结果。然而,仅有少数方法被提出用于验证这些解释的准确性与忠实性。近期,近似计算留一法训练对损失函数影响的影响函数方法被证实存在脆弱性。关于这种脆弱性的成因目前仍不明确。尽管先前研究建议使用正则化增强鲁棒性,但该方法并非在所有情况下都适用。本研究旨在通过复现前人工作中的实验,探究影响函数脆弱性的内在机制。首先,我们在满足影响函数凸性假设的条件下,采用文献中的验证流程检验影响函数。随后,我们放宽这些假设,通过使用更深层模型与更复杂数据集研究非凸性的影响。在此过程中,我们分析了用于验证影响函数的关键指标与流程。研究结果表明,验证流程本身可能导致观测到的脆弱性。