Kernel methods provide a flexible and theoretically grounded approach to nonlinear and nonparametric learning. While memory and run-time requirements hinder their applicability to large datasets, many low-rank kernel approximations, such as random Fourier features, were recently developed to scale up such kernel methods. However, these scalable approaches are based on approximations of isotropic kernels, which cannot remove the influence of irrelevant features. In this work, we design random Fourier features for a family of automatic relevance determination (ARD) kernels, and introduce RFFNet, a new large-scale kernel method that learns the kernel relevances' on the fly via first-order stochastic optimization. We present an effective initialization scheme for the method's non-convex objective function, evaluate if hard-thresholding RFFNet's learned relevances yield a sensible rule for variable selection, and perform an extensive ablation study of RFFNet's components. Numerical validation on simulated and real-world data shows that our approach has a small memory footprint and run-time, achieves low prediction error, and effectively identifies relevant features, thus leading to more interpretable solutions. We supply users with an efficient, PyTorch-based library, that adheres to the scikit-learn standard API and code for fully reproducing our results.
翻译:核方法为非线性和非参数化学习提供了灵活且理论上严谨的途径。尽管内存和运行时间需求限制了其在大型数据集上的适用性,但近年来诸如随机傅里叶特征等许多低秩核近似方法被开发出来,以扩展此类核方法的规模。然而,这些可扩展方法基于各向同性核的近似,无法消除无关特征的影响。在本工作中,我们为一系列自动相关性确定(ARD)核设计了随机傅里叶特征,并引入RFFNet——一种通过一阶随机优化在线学习核相关性的新型大规模核方法。我们为该方法的非凸目标函数提出了一种有效的初始化方案,评估了硬阈值化RFFNet所学相关性是否能为变量选择提供合理准则,并对RFFNet的各个组件进行了广泛的消融研究。在模拟数据和真实世界数据上的数值验证表明,我们的方法具有较小的内存占用和运行时间,实现了较低的预测误差,并能有效识别相关特征,从而产生更具可解释性的解决方案。我们为用户提供一个基于PyTorch的高效库,该库遵循scikit-learn标准API,并提供完整重现我们结果的代码。