We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.
翻译:我们研究二值化和连续负边际感知机作为简单非凸神经网络模型学习随机规则与关联的过程。分析了两种模型中解空间几何结构,并发现了重要的相似性与差异。两种模型均存在极度平坦且宽广的次优极小值。这些极小值与背景中的主导解共存:二值化情形下主导解由指数级数量算法无法访问的小聚类构成(冻结的1-RSB相),球形情形下则由不同规模聚类的层次结构构成(全RSB相)。在两种情形中,当约束密度超过特定阈值时,宽平坦极小值的局部熵呈现非单调性,表明鲁棒解空间分裂为不连通分量。这对二值化模型中算法的行为产生强烈影响——算法无法访问剩余的孤立聚类。对于球形情形则表现不同:即便宽平坦极小值消失后,剩余解始终被大量其他解包围(任意距离直至容量极限)。我们实际提供数值证据表明,算法似乎能在直至SAT/UNSAT转变点的范围内找到解(该转变点采用1RSB近似计算)。对于两种模型,即使在高欠约束的强负边际区域进行训练,宽平坦极小值的存在仍能显著提升其作为学习设备的泛化性能。