In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and how this arrangement changes as the size of the training set increases. I also illustrate how different regions of solution space can be explored analytically and how the landscape in the vicinity of a solution can be characterized. I give evidence how, in binary weight models, algorithmic hardness is a consequence of the disappearance of a clustered region of solutions that extends to very large distances. Finally, I demonstrate how the study of linear mode connectivity between solutions can give insights into the average shape of the solution manifold.
翻译:在这篇教学性笔记中,我回顾了神经网络的统计力学方法,聚焦于分类设置下具有二元和连续权重的感知器架构这一典型示例。我将回顾基于复制方法的Gardner方法以及存储设置中SAT/UNSAT转变的推导。然后,我讨论了一些近期工作,这些工作揭示了零训练误差配置的几何排列方式,以及这种排列如何随训练集大小的增加而变化。我还阐述了如何解析地探索解空间的不同区域,以及如何刻画解附近的地形特征。我给出证据表明,在二元权重模型中,算法硬度是延伸到极远距离的聚类解区域消失的结果。最后,我展示了研究解之间的线性模式连通性如何为解流形的平均形状提供见解。