It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.
翻译:通常,紧凑地概括模型参数和训练数据的关键属性十分有用,以便后续无需存储或遍历整个数据集即可使用。具体而言,我们考虑估计训练集上的函数空间距离(FSD),即两个神经网络输出之间的平均差异。我们提出了一种线性化激活函数技巧(LAFTR),并推导出ReLU神经网络FSD的高效近似方法。核心思想是将网络架构近似为具有随机门控的线性网络。尽管我们的方法仅需网络每单元一个参数,但其在性能上超越了其他需要更大存储空间的参数化近似方法。应用于持续学习时,我们的参数化近似与需要存储大量训练样本的先进非参数化近似方法不相上下。此外,我们还展示了该方法在高效估计影响函数和检测错误标注样本方面的有效性,且无需对整个数据集进行昂贵的迭代计算。