Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as random effects, unregularized fixed effects, or non-Gaussian outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction where KRLS can be used alongside random effects, splines, and unregularized fixed effects. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.
翻译:核正则化最小二乘法(KRLS)是一种流行的灵活估计模型方法,适用于变量间可能存在复杂关系的情境。然而,其广泛应用受到两个方面的限制。首先,现有方法缺乏灵活性,无法将KRLS与理论驱动的扩展相结合,例如随机效应、非正则化固定效应或非高斯结果变量。其次,即使对于中等规模的数据集,其估计过程也极其耗费计算资源。本文通过引入广义核正则化最小二乘法(gKRLS)来解决这两个问题。我们注意到,KRLS可以重新表述为层次模型,从而简化推断过程并实现模块化模型构建,使得KRLS能够与随机效应、样条函数和非正则化固定效应协同使用。在计算方面,我们采用随机草图法(random sketching)来显著加速估计,同时仅对估计质量产生有限影响。我们证明,gKRLS可以在不到一分钟内完成对包含数万个观测值的数据集的拟合。此外,需要多次拟合模型(例如元学习器)的先进技术也能得到快速估计。