We study feature learning in a compositional variant of kernel ridge regression in which the predictor is applied to a learnable linear transformation of the input. When the response depends on the input only through a low-dimensional predictive subspace, we show that all global minimizers of the population objective for the linear transformation annihilate directions orthogonal to this subspace, and in certain regimes, exactly identify the subspace. Moreover, we show that global minimizers of the finite-sample objective inherit the exact same low-dimensional structure with high probability, even without any explicit penalization on the linear transformation.
翻译:我们研究核岭回归的一种组合变体中的特征学习,其中预测器应用于输入的可学习线性变换。当响应仅通过低维预测子空间依赖于输入时,我们证明线性变换的总体目标函数的所有全局最小化器会消除与该子空间正交的方向,并且在某些机制下能精确识别该子空间。此外,我们证明即使不对线性变换施加任何显式惩罚,有限样本目标函数的全局最小化器也能以高概率继承完全相同的低维结构。