It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.
翻译:通常需要简洁地总结模型参数和训练数据的重要属性,以便后续使用而无需存储或遍历整个数据集。作为具体案例,我们考虑在训练集上估计函数空间距离(FSD),即两个神经网络输出之间的平均差异。我们提出线性化激活函数技巧(LAFTR),并推导出针对ReLU神经网络的高效FSD近似方法。核心思想是将神经网络架构近似为具有随机门控机制的线性网络。尽管每个网络单元仅需一个参数,但我们的方法在内存占用上优于其他需更大存储量的参量化近似。将本方法应用于持续学习时,其性能可与需存储大量训练样本的最先进非参量化近似相媲美。此外,我们展示了该方法在无需对整个数据集进行昂贵迭代的情况下,高效估计影响函数及检测错误标注样本的有效性。