Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as random effects, unregularized fixed effects, or non-Gaussian outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction where KRLS can be used alongside random effects, splines, and unregularized fixed effects. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.
翻译:核正则最小二乘法(KRLS)是一种灵活估计变量间可能存在复杂关系模型的流行方法。然而,其应用受限于两个因素。首先,现有方法缺乏灵活性,无法使KRLS与理论驱动的扩展方法结合,例如随机效应、非正则化固定效应或非高斯分布结果。其次,即使是中等规模的数据集,其估计过程在计算上也极为耗时。本文通过引入广义核正则最小二乘法(gKRLS)解决上述问题。我们注意到KRLS可重新表述为层次模型,从而支持便捷的推断与模块化模型构建,使KRLS能与随机效应、样条函数以及非正则化固定效应联合使用。在计算方面,我们采用随机抽样技术显著加速估计过程,同时仅引入有限的质量损失。实验表明,gKRLS可在一分钟内完成数万观测值数据集的拟合,且能快速估计需对模型进行十余次拟合的前沿技术(如元学习器)。