Bridging the gap between the practical performance of deep learning and its theoretical foundations often involves analyzing neural networks through stochastic gradient descent (SGD). Expanding on previous research that focused on modeling structured inputs under a simple Gaussian setting, we analyze the behavior of a deep learning system trained on inputs modeled as Gaussian mixtures to better simulate more general structured inputs. Through empirical analysis and theoretical investigation, we demonstrate that under certain standardization schemes, the deep learning model converges toward Gaussian setting behavior, even when the input data follow more complex or real-world distributions. This finding exhibits a form of universality in which diverse structured distributions yield results consistent with Gaussian assumptions, which can support the theoretical understanding of deep learning models.
翻译:弥合深度学习实际性能与其理论基础之间的差距通常涉及通过随机梯度下降(SGD)分析神经网络。先前研究侧重于在简单高斯设定下对结构化输入进行建模,本文在此基础上进行扩展,分析了以高斯混合模型作为输入进行训练的深度学习系统的行为,以更好地模拟更一般的结构化输入。通过实证分析和理论探究,我们证明在某些标准化方案下,即使输入数据遵循更复杂或真实世界的分布,深度学习模型的行为也会收敛于高斯设定下的行为。这一发现展现了一种普适性形式,即多样化的结构化分布会产生与高斯假设一致的结果,这有助于深化对深度学习模型的理论理解。