The storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn. It is one of the indicators of the expressive power of machine learning models and is important for comparing the performance of various models. In this study, we analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions using the replica method from statistical physics. Our results demonstrate that the storage capacity per parameter remains finite even with infinite width and that the weights of the network exhibit negative correlations, leading to a 'division of labor'. In addition, we find that increasing the dataset size triggers a phase transition at a certain transition point where the permutation symmetry of weights is broken, resulting in the solution space splitting into disjoint regions. We identify the dependence of this transition point and the storage capacity on the choice of activation function. These findings contribute to understanding the influence of activation functions and the number of parameters on the structure of the solution space, potentially offering insights for selecting appropriate architectures based on specific objectives.
翻译:二元分类模型的存储容量是指模型能够学习的随机输入-输出对数量与参数数量之比的上限。这是机器学习模型表达能力的重要指标之一,对于比较不同模型的性能具有重要意义。本研究采用统计物理学中的复本方法,分析了具有通用激活函数的全连接双层神经网络的解空间结构及其存储容量。研究结果表明,即使网络宽度趋于无穷,单位参数的存储容量仍保持有限值,且网络权重呈现负相关性,形成"分工协作"机制。此外,我们发现当数据集规模增大至特定临界点时,系统会发生相变:权重的置换对称性被打破,导致解空间分裂为互不连通的区域。我们明确了该相变点及存储容量对激活函数选择的依赖关系。这些发现有助于理解激活函数和参数数量对解空间结构的影响,为根据特定目标选择合适的网络架构提供了理论依据。