Bernstein-Schur Kernels: Random Features by Sketched Modulation and Radial Randomization

Bernstein--Schur kernels are products of a finite-feature kernel and a completely monotone shift-invariant kernel: nonstationary kernels falling between the shift-invariant and dot-product templates random features exploit, so neither Bochner sampling nor polynomial sketching applies to the full kernel directly. We give one random-feature construction for the whole class that randomizes both factors: it sketches the finite modulation and samples the radial factor's one-dimensional Bernstein--Widder scale before applying Gaussian random Fourier features, giving feature dimension $Dm$, free of the $O(d^2)$ size of the exact modulation feature. With the modulation kept exact (the $m\to\infty$ limit), we prove unbiasedness, an exact variance, and a matrix-Bernstein operator-norm bound controlled by the top kernel and modulation eigenvalues and an intrinsic dimension rather than the crude $N\max_{ij}$ route. Whitening this argument at the ridge makes the effective dimension $d_{\mathrm{eff}}(λ)$ the \emph{exact} intrinsic dimension of the matrix variance, so $O((1+\|P\|_{\mathrm{op}}/λ)\log(d_{\mathrm{eff}}/δ))$ radial draws preserve the kernel-ridge solution; tilting the draw by a closed-form whitened leverage improves this to the effective-dimension count $O((1+d_{\mathrm{eff}})\log(d_{\mathrm{eff}}/δ))$. Conditioning on the sketch carries every guarantee to the deployed doubly-randomized estimator up to one additive sketch term, and all hold for the whole class with the modulation Gram in place of the polynomial one. The flagship instance is the biased $yat$-kernel $k_{yat,b}(w,x)=(w^\top x+b)^2/(\|w-x\|^2+\varepsilon)$, whose family span contains the inverse-multiquadric kernel by finite differences in $b$.

翻译：[translated abstract in Chinese] Bernstein-Schur核函数是一类有限特征核函数与完全单调平移不变核函数的乘积：这类非平稳核函数介于随机特征所利用的平移不变模板与点积模板之间，因此Bochner采样与多项式草图法均无法直接应用于完整核函数。我们为整个核函数类别提出了一种随机特征构造方法，该方法同时对两个因子进行随机化：对有限调制进行草图化处理，并对其径向因子的一维Bernstein-Widder尺度进行采样，随后应用高斯随机傅里叶特征，生成特征维度$Dm$，避免了精确调制特征所需的$O(d^2)$规模。在保持调制精确（即$m\to\infty$极限）的情况下，我们证明了无偏性、精确方差，以及由顶部核函数与调制特征值以及固有维度（而非粗略的$N\max_{ij}$路径）控制的矩阵Bernstein算子范数界。对该参数进行岭回归白化处理后，有效维度$d_{\mathrm{eff}}(λ)$成为矩阵方差的精确固有维度，因此$O((1+\|P\|_{\mathrm{op}}/λ)\log(d_{\mathrm{eff}}/δ))$次径向采样即可保持核岭回归解；通过封闭形式的白化杠杆倾斜采样，该结果可改进至有效维度计数$O((1+d_{\mathrm{eff}})\log(d_{\mathrm{eff}}/δ))$。基于草图的条件下，每个保证均能传递至实际部署的双重随机化估计器（仅额外增加一个草图项），且所有结论均适用于整个核函数类别（仅需将多项式Gram矩阵替换为调制Gram矩阵）。典型实例为有偏$yat$核函数$k_{yat,b}(w,x)=(w^\top x+b)^2/(\|w-x\|^2+\varepsilon)$，其函数族通过关于$b$的有限差分包含逆多二次核函数。