Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.

翻译：我们研究二元和连续负间隔感知器作为简单的非凸神经网络模型，学习随机规则与关联。我们分析了两种模型中解空间的几何结构，发现重要相似性与差异。两种模型均存在极度平坦且宽广的次主导极小点，这些极小点与主导解背景共存——在二元情况下，主导解由指数级数量的算法不可达小簇（冻结1-RSB相）构成；在球面情况下，则由不同尺寸簇的层次结构（全RSB相）构成。当约束密度超过特定阈值时，两种情形下宽平坦极小点的局部熵均呈现非单调性，表明鲁棒解空间分裂为不连通分量。这对二元模型中算法行为产生强烈影响——算法无法访问其余孤立簇。球面情况则不同：即使宽平坦极小点消失后，剩余解在任意距离下仍始终被大量其他解包围，直至容量极限。事实上，我们通过数值证据表明，算法能持续找到解直至SAT/UNSAT转变点（本文采用1RSB近似计算）。对于两种模型，即使在高欠约束的极端负间隔条件下训练，宽平坦极小点的存在也显著提升了作为学习装置的泛化性能。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

IJCAI2022《对抗序列决策》教程，164页ppt

专知会员服务

48+阅读 · 2022年7月27日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日