Random Fourier Features (RFF) is among the most popular and broadly applicable approaches for scaling up kernel methods. In essence, RFF allows the user to avoid costly computations on a large kernel matrix via a fast randomized approximation. However, a pervasive difficulty in applying RFF is that the user does not know the actual error of the approximation, or how this error will propagate into downstream learning tasks. Up to now, the RFF literature has primarily dealt with these uncertainties using theoretical error bounds, but from a user's standpoint, such results are typically impractical -- either because they are highly conservative or involve unknown quantities. To tackle these general issues in a data-driven way, this paper develops a bootstrap approach to numerically estimate the errors of RFF approximations. Three key advantages of this approach are: (1) The error estimates are specific to the problem at hand, avoiding the pessimism of worst-case bounds. (2) The approach is flexible with respect to different uses of RFF, and can even estimate errors in downstream learning tasks. (3) The approach enables adaptive computation, so that the user can quickly inspect the error of a rough initial kernel approximation and then predict how much extra work is needed. Lastly, in exchange for all of these benefits, the error estimates can be obtained at a modest computational cost.
翻译:随机傅里叶特征(RFF)是扩展核方法适用性中最流行且广泛应用的途径之一。本质上,RFF允许用户通过快速随机化近似避免在大核矩阵上进行昂贵计算。然而,应用RFF时普遍存在的困难在于用户无法获知近似值的实际误差,以及该误差将如何传播至下游学习任务。迄今为止,RFF相关文献主要利用理论误差界处理这些不确定性,但从用户视角看,此类结果通常不切实际——因其或过于保守,或涉及未知量。为以数据驱动方式解决这些通用问题,本文开发了一种自助法来数值估计RFF近似的误差。该方法具有三个关键优势:(1)误差估计针对具体问题,避免了最坏情况界的悲观性;(2)该方法对不同RFF使用场景具有灵活性,甚至可估计下游学习任务中的误差;(3)该方法支持自适应计算,使用户能快速检查初始粗略核近似的误差,并预测需要额外执行多少计算。最后,作为这些收益的交换,误差估计的计算成本适中。