Deep neural networks have attained remarkable success across diverse classification tasks. Recent empirical studies have shown that deep networks learn features that are linearly separable across classes. However, these findings often lack rigorous justifications, even under relatively simple settings. In this work, we address this gap by examining the linear separation capabilities of shallow nonlinear networks. Specifically, inspired by the low intrinsic dimensionality of image data, we model inputs as a union of low-dimensional subspaces (UoS) and demonstrate that a single nonlinear layer can transform such data into linearly separable sets. Theoretically, we show that this transformation occurs with high probability when using random weights and quadratic activations. Notably, we prove this can be achieved when the network width scales polynomially with the intrinsic dimension of the data rather than the ambient dimension. Experimental results corroborate these theoretical findings and demonstrate that similar linear separation properties hold in practical scenarios beyond our analytical scope. This work bridges the gap between empirical observations and theoretical understanding of the separation capacity of nonlinear networks, offering deeper insights into model interpretability and generalization.
翻译:深度神经网络在各类分类任务中取得了显著成功。近期实证研究表明,深度网络学习到的特征在类别间具有线性可分性。然而,这些发现往往缺乏严格的理论依据,即使在相对简单的设定下亦是如此。本研究通过考察浅层非线性网络的线性分离能力来填补这一空白。具体而言,受图像数据低本征维度的启发,我们将输入建模为低维子空间的并集,并证明单个非线性层可将此类数据转换为线性可分的集合。理论上,我们证明了在使用随机权重和二次激活函数时,这种转换以高概率发生。值得注意的是,我们证明当网络宽度随数据本征维度(而非环境维度)多项式增长时即可实现这一目标。实验结果验证了这些理论发现,并表明在实际场景中(超出我们分析范围)也存在类似的线性分离特性。本研究弥合了非线性网络分离能力的实证观察与理论理解之间的鸿沟,为模型可解释性与泛化性提供了更深入的见解。