In order to better understand feature learning in neural networks, we propose a framework for understanding linear models in tangent feature space where the features are allowed to be transformed during training. We consider linear transformations of features, resulting in a joint optimization over parameters and transformations with a bilinear interpolation constraint. We show that this optimization problem has an equivalent linearly constrained optimization with structured regularization that encourages approximately low rank solutions. Specializing to neural network structure, we gain insights into how the features and thus the kernel function change, providing additional nuance to the phenomenon of kernel alignment when the target function is poorly represented using tangent features. In addition to verifying our theoretical observations in real neural networks on a simple regression problem, we empirically show that an adaptive feature implementation of tangent feature classification has an order of magnitude lower sample complexity than the fixed tangent feature model on MNIST and CIFAR-10.
翻译:为了更深入地理解神经网络中的特征学习,我们提出一个框架,用于理解切向特征空间中的线性模型,在该空间中特征允许在训练过程中发生变换。我们考虑特征的线性变换,从而产生一个在双线性插值约束下对参数和变换的联合优化问题。我们证明该优化问题等价于一个具有结构化正则化的线性约束优化问题,该正则化鼓励近似低秩解。专门针对神经网络结构时,我们能够洞察特征(从而核函数)如何变化,这为当目标函数在切向特征中表征不佳时的核对齐现象提供了更细致的理解。除了在一个简单回归问题上验证了真实神经网络中的理论观察外,我们还通过实验证明,在MNIST和CIFAR-10数据集上,自适应特征实现的切向特征分类的样本复杂度比固定切向特征模型低一个数量级。