The statistical properties of deep neural networks (DNNs) at initialization play an important role to comprehend their trainability and the intrinsic architectural biases they possess before data exposure Well established mean field (MF) theories have uncovered that the distribution of parameters of randomly initialized networks strongly influences the behavior of the gradients, dictating whether they explode or vanish. Recent work has showed that untrained DNNs also manifest an initial guessing bias (IGB), in which large regions of the input space are assigned to a single class. In this work, we provide a theoretical proof that links IGB to previous MF theories for a vast class of DNNs, showing that efficient learning is tightly connected to a network prejudice towards a specific class. This connection leads to a counterintuitive conclusion: the initialization that optimizes trainability is systematically biased rather than neutral.
翻译:深度神经网络(DNN)在初始化阶段的统计特性,对于理解其可训练性及在接触数据前所固有的架构偏置至关重要。成熟的平均场(MF)理论已揭示,随机初始化网络的参数分布强烈影响梯度的行为,决定其是爆炸还是消失。近期研究表明,未经训练的DNN也表现出初始猜测偏置(IGB),即将输入空间的大部分区域分配给单一类别。本工作中,我们为广泛类别的DNN提供了将IGB与先前MF理论相联系的理论证明,表明高效学习与网络对特定类别的偏好紧密相关。这一关联导出了一个反直觉的结论:优化可训练性的初始化方式本质上是系统偏置的,而非中性的。