In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework called the low-rank update that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a method called the Generalized Low-Rank Update (GLRU) which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as SVM and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the amount of dataset changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.
翻译:本研究提出了一种增量式机器学习方法,可在少量样本或特征被添加或移除时高效获取最优模型。该问题在模型选择(如交叉验证和特征选择)中具有重要实践价值。在线性估计器这类机器学习方法中,存在一种被称为低秩更新的高效模型更新框架,能有效处理数据矩阵中少量行与列的变动。然而,对于线性估计器之外的机器学习方法,目前尚缺乏能在特定计算复杂度内获取更新解信息的通用框架。为此,本研究引入了一种名为广义低秩更新(GLRU)的方法,将线性估计器的低秩更新框架扩展至可表述为某类正则化经验风险最小化的机器学习方法,包括支持向量机和逻辑回归等常用方法。所提出的GLRU方法不仅拓展了适用范围,还能以与数据集变动量成比例的计算复杂度提供更新解的信息。为验证GLRU方法的有效性,我们通过实验展示了其在交叉验证和特征选择任务中相较于其他基线方法的高效性。