Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep convolutional neural networks from natural images resemble those learnt by independent component analysis (ICA), a simple unsupervised method that seeks the most non-Gaussian projections of its inputs. This similarity suggests that ICA provides a simple, yet principled model for studying feature learning. Here, we leverage this connection to investigate the interplay between data structure and optimisation in feature learning for the most popular ICA algorithm, FastICA, and stochastic gradient descent (SGD), which is used to train deep networks. We rigorously establish that FastICA requires at least $n\gtrsim d^4$ samples to recover a single non-Gaussian direction from $d$-dimensional inputs on a simple synthetic data model. We show that vanilla online SGD outperforms FastICA, and prove that the optimal sample complexity $n \gtrsim d^2$ can be reached by smoothing the loss, albeit in a data-dependent way. We finally demonstrate the existence of a search phase for FastICA on ImageNet, and discuss how the strong non-Gaussianity of said images compensates for the poor sample complexity of FastICA.
翻译:深度神经网络能够从复杂的非高斯输入中学习结构化特征,但这一过程背后的机制仍不甚明晰。我们的研究动机源于观察到深度卷积神经网络从自然图像中学得的第一层滤波器,与独立成分分析(ICA)所学习的滤波器相似——ICA是一种简单的无监督方法,旨在寻找输入数据中非高斯性最强的投影。这种相似性表明,ICA为研究特征学习提供了一个简单而具有原理性的模型。本文利用这一关联,针对最流行的ICA算法FastICA以及用于训练深度网络的随机梯度下降(SGD),探究特征学习中数据结构与优化之间的相互作用。我们在一个简单的合成数据模型上严格证明,FastICA需要至少 $n\gtrsim d^4$ 个样本才能从 $d$ 维输入中恢复单个非高斯方向。我们展示了普通在线SGD的表现优于FastICA,并证明通过对损失函数进行平滑处理(尽管是数据依赖的方式),可以达到最优样本复杂度 $n \gtrsim d^2$。最后,我们证明了FastICA在ImageNet上存在搜索阶段,并讨论了该图像数据强烈的非高斯性如何弥补FastICA较差的样本复杂度。