To achieve near-zero training error in a classification problem, the layers of a feed-forward network have to disentangle the manifolds of data points with different labels, to facilitate the discrimination. However, excessive class separation can bring to overfitting since good generalisation requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimisation dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across data sets and architectures) increases the class entanglement.The training error at the inversion is stable under subsampling, and across network initialisations and optimisers, which characterises it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set, coined ``stragglers'', particularly influential for generalisation.
翻译:在分类问题中,为达到接近零的训练误差,前馈网络的各层必须解除不同标签数据点流形之间的纠缠,以促进判别。然而,过度的类分离可能导致过拟合,因为良好的泛化需要学习不变特征,这涉及一定程度的纠缠。我们报告了数值实验,展示了优化动力学如何以非单调趋势找到平衡这些对立倾向的表示。在快速的分离阶段之后,一个较慢的重排阶段(在不同数据集和架构中保持一致)增加了类纠缠。反转时的训练误差在下采样、网络初始化和优化器下保持稳定,这表明它仅是数据结构以及(非常弱地)架构属性的体现。反转是由训练集中定义明确且最大稳定的元素(称为“落伍者”)所引发的权衡表现,这些元素对泛化尤为重要。