We consider dense, associative neural-networks trained by a teacher (i.e., with supervision) and we investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as quality and quantity of the training dataset, network storage and noise, that is valid in the limit of large network size and structureless datasets: these networks may work in a ultra-storage regime (where they can handle a huge amount of patterns, if compared with shallow neural networks) or in a ultra-detection regime (where they can perform pattern recognition at prohibitive signal-to-noise ratios, if compared with shallow neural networks). Guided by the random theory as a reference framework, we also test numerically learning, storing and retrieval capabilities shown by these networks on structured datasets as MNist and Fashion MNist. As technical remarks, from the analytic side, we implement large deviations and stability analysis within Guerra's interpolation to tackle the not-Gaussian distributions involved in the post-synaptic potentials while, from the computational counterpart, we insert Plefka approximation in the Monte Carlo scheme, to speed up the evaluation of the synaptic tensors, overall obtaining a novel and broad approach to investigate supervised learning in neural networks, beyond the shallow limit, in general.
翻译:我们考虑由教师训练(即监督学习)的密集关联神经网络,并通过自旋玻璃的统计力学从解析角度以及蒙特卡洛模拟从数值角度研究其计算能力。特别地,我们获得了总结其性能的相图,该相图作为训练数据集质量与数量、网络存储容量和噪声等控制参数的函数,在大网络规模和无结构数据集的极限下成立:这些网络可工作在超存储机制(与浅层神经网络相比,能处理海量模式)或超检测机制(与浅层神经网络相比,能在禁止性信噪比下完成模式识别)。以随机理论作为参考框架,我们还数值测试了这些网络在结构化数据集(如MNist和Fashion MNist)上的学习、存储和检索能力。作为技术要点,从解析层面,我们在Guerra插值中实现大偏差和稳定性分析,以处理突触后电位中涉及的非高斯分布;从计算层面,我们在蒙特卡洛方案中引入Plefka近似以加速突触张量评估,总体而言获得了一种新颖且广泛的方法,用于在一般条件下超越浅层极限研究神经网络中的监督学习。