The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are not unique to neural networks, and can be elicited from classical kernel methods. Namely, we show that the derivative of the kernel predictor can detect the influential coordinates with low sample complexity. Moreover, by iteratively using the derivatives to reweight the data and retrain kernel machines, one is able to efficiently learn hierarchical polynomials with finite leap complexity. Numerical experiments illustrate the developed theory.
翻译:神经网络令人瞩目的实际性能常归因于其能够直接从数据中学习低维数据表示与层次结构。本工作指出,这两种现象并非神经网络所独有,亦可通过经典核方法实现。具体而言,我们证明核预测器的导数能够以较低样本复杂度检测出关键坐标。进一步地,通过迭代使用导数对数据进行重加权并重新训练核机器,可有效学习具有有限跳跃复杂度的层次多项式。数值实验验证了所发展的理论。