The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.
翻译:弱监督到强监督泛化现象是推动重要机器学习应用的关键驱动力,包括高数据效率学习以及近期备受关注的超对齐任务。尽管数十年的研究已催生出众多具备强大实证性能的算法,但对于数据中哪些特性能够促成弱监督到强监督泛化的理解仍显不足。我们提出一种简洁的数据中心机制来刻画弱监督到强监督泛化:重叠密度。直观而言,泛化性能与数据集中包含重叠模式的数据点数量相关——这些数据点同时包含简单模式(可由弱模型学习)与困难模式(仅能由更强模型学习)。借助此类数据点,强模型可利用弱模型的预测结果来学习困难模式。我们提出一种实用的重叠检测算法,用于在数据集中识别此类数据点,并基于该算法设计数据选择策略,从而在多源数据中筛选能够最大化重叠密度、进而提升弱监督到强监督泛化性能的查询样本。我们给出了理论结果,证明泛化收益是重叠密度的函数,并为数据选择算法提供了遗憾界。最后,我们在多种实验场景中验证了该机制与重叠检测算法的有效性。