Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data.
翻译:影响诊断方法(如影响函数和近似最大影响扰动)在机器学习及人工智能领域应用中广受欢迎。这些影响诊断工具是识别具有影响力的数据点或数据子集的强大统计手段。我们建立了影响函数和近似最大影响扰动的有限样本统计界以及计算复杂度界,这些方法通过高效的逆黑塞矩阵向量乘积实现。我们利用广义线性模型和基于注意力的大规模模型,在合成数据与真实数据上验证了研究结果。