Machine learning systems such as large scale recommendation systems or natural language processing systems are usually trained on billions of training points and are associated with hundreds of billions or trillions of parameters. Improving the learning process in such a way that both the training load is reduced and the model accuracy improved is highly desired. In this paper we take a first step toward solving this problem, studying influence functions from the perspective of simplifying the computations they involve. We discuss assumptions, under which influence computations can be performed on significantly fewer parameters. We also demonstrate that the sign of the influence value can indicate whether a training point is to memorize, as opposed to generalize upon. For this purpose we formally define what memorization means for a training point, as opposed to generalization. We conclude that influence functions can be made practical, even for large scale machine learning systems, and that influence values can be taken into account by algorithms that selectively remove training points, as part of the learning process.
翻译:大规模推荐系统或自然语言处理等机器学习系统通常基于数十亿训练数据点进行训练,并关联数千亿或数万亿参数。优化学习过程以减少训练负载并提升模型精度是高度期望的目标。本文迈出了解决这一问题的第一步,从简化计算的角度研究影响函数。我们讨论了在何种假设下,能够以显著更少的参数执行影响计算。同时证明影响值的符号可指示训练数据点是否倾向于被记忆而非泛化。为此,我们正式定义了训练数据点相对于泛化的记忆化概念。结论表明,即使对于大规模机器学习系统,影响函数也能被实际应用,且学习过程中可选择性地移除训练数据点的算法可将影响值纳入考量。