Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds

We study the capacity of \emph{sign} perceptrons neural networks (SPNN) and particularly focus on 1-hidden layer \emph{treelike committee machine} (TCM) architectures. Similarly to what happens in the case of a single perceptron neuron, it turns out that, in a statistical sense, the capacity of a corresponding multilayered network architecture consisting of multiple \emph{sign} perceptrons also undergoes the so-called phase transition (PT) phenomenon. This means: (i) for certain range of system parameters (size of data, number of neurons), the network can be properly trained to accurately memorize \emph{all} elements of the input dataset; and (ii) outside the region such a training does not exist. Clearly, determining the corresponding phase transition curve that separates these regions is an extraordinary task and among the most fundamental questions related to the performance of any network. Utilizing powerful mathematical engine called Random Duality Theory (RDT), we establish a generic framework for determining the upper bounds on the 1-hidden layer TCM SPNN capacity. Moreover, we do so for \emph{any} given (odd) number of neurons. We further show that the obtained results \emph{exactly} match the replica symmetry predictions of \cite{EKTVZ92,BHS92}, thereby proving that the statistical physics based results are not only nice estimates but also mathematically rigorous bounds as well. Moreover, for $d\leq 5$, we obtain the capacity values that improve on the best known rigorous ones of \cite{MitchDurb89}, thereby establishing a first, mathematically rigorous, progress in well over 30 years.

翻译：我们研究符号感知器神经网络（SPNN）的容量，并特别关注单隐层树状委员会机器（TCM）架构。与单个感知器神经元的情况类似，在统计意义上，由多个符号感知器组成的相应多层网络架构也会经历所谓的相变（PT）现象。这意味着：(i) 在特定系统参数范围（数据规模、神经元数量）内，网络能够被正确训练以准确记忆输入数据集中的所有元素；(ii) 在该区域外，此类训练不存在。显然，确定分隔这些区域的相变曲线是一项非凡的任务，也是与任何网络性能相关的最基本问题之一。利用名为随机对偶理论（RDT）的强大数学引擎，我们建立了一个通用框架，用于确定单隐层TCM SPNN容量的上界。此外，我们针对任意给定的（奇数）神经元数量都完成了这一工作。我们进一步表明，所得结果与文献[EKTVZ92, BHS92]中的复制对称性预测精确匹配，从而证明基于统计物理的结果不仅是良好的估计，也是数学上严格的界。此外，对于d≤5的情况，我们获得的容量值改进了文献[MitchDurb89]中已知的最佳严格结果，从而在30多年间首次取得了数学上严格的进展。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日