Neural networks extract features from data using stochastic gradient descent (SGD). In particular, higher-order input cumulants (HOCs) are crucial for their performance. However, extracting information from the $p$th cumulant of $d$-dimensional inputs is computationally hard: the number of samples required to recover a single direction from an order-$p$ tensor (tensor PCA) using online SGD grows as $d^{p-1}$, which is prohibitive for high-dimensional inputs. This result raises the question of how neural networks extract relevant directions from the HOCs of their inputs efficiently. Here, we show that correlations between latent variables along the directions encoded in different input cumulants speed up learning from higher-order correlations. We show this effect analytically by deriving nearly sharp thresholds for the number of samples required by a single neuron to weakly-recover these directions using online SGD from a random start in high dimensions. Our analytical results are confirmed in simulations of two-layer neural networks and unveil a new mechanism for hierarchical learning in neural networks.
翻译:神经网络通过随机梯度下降(SGD)从数据中提取特征。其中,高阶输入累积量(HOCs)对其性能至关重要。然而,从 $d$ 维输入的第 $p$ 阶累积量中提取信息在计算上是困难的:使用在线SGD从 $p$ 阶张量(张量主成分分析)中恢复单个方向所需的样本数量随 $d^{p-1}$ 增长,这对于高维输入而言是难以承受的。这一结果引出了一个问题:神经网络如何高效地从其输入的高阶累积量中提取相关方向?本文证明,沿不同输入累积量所编码方向上的潜变量之间的相关性,可加速从高阶相关性中学习的过程。我们通过分析推导了单个神经元在高维情况下从随机初始状态出发,使用在线SGD弱恢复这些方向所需样本量的近乎尖锐的阈值,从而在理论上揭示了这一效应。我们的分析结果在两层神经网络的仿真中得到验证,并揭示了神经网络中一种新的分层学习机制。