Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.

翻译：我们研究二值化和连续负边际感知机作为简单非凸神经网络模型学习随机规则与关联的过程。分析了两种模型中解空间几何结构，并发现了重要的相似性与差异。两种模型均存在极度平坦且宽广的次优极小值。这些极小值与背景中的主导解共存：二值化情形下主导解由指数级数量算法无法访问的小聚类构成（冻结的1-RSB相），球形情形下则由不同规模聚类的层次结构构成（全RSB相）。在两种情形中，当约束密度超过特定阈值时，宽平坦极小值的局部熵呈现非单调性，表明鲁棒解空间分裂为不连通分量。这对二值化模型中算法的行为产生强烈影响——算法无法访问剩余的孤立聚类。对于球形情形则表现不同：即便宽平坦极小值消失后，剩余解始终被大量其他解包围（任意距离直至容量极限）。我们实际提供数值证据表明，算法似乎能在直至SAT/UNSAT转变点的范围内找到解（该转变点采用1RSB近似计算）。对于两种模型，即使在高欠约束的强负边际区域进行训练，宽平坦极小值的存在仍能显著提升其作为学习设备的泛化性能。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日