When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks by discovering the mystery underlying a common practice during neural network model construction: sample splitting. Our theory demonstrates that, the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.
翻译:尽管人工神经网络在多个领域已展现出卓越的实际成功,对其理论特性(如逼近能力、统计特性和泛化性能)的研究也取得了重大进展,但本文通过揭示神经网络模型构建中常见做法——样本划分——背后的奥秘,构建了一套理解神经网络有效性的新理论。该理论证明,基于样本划分得出的最优超参数能够使神经网络模型渐近地最小化预测风险。我们在不同应用场景和网络架构下开展了广泛实验,结果验证了该理论的有效性。