We characterize the equilibrium properties of a model of $y$ coupled binary perceptrons in the teacher-student scenario, subject to a learning rule, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which thermal noise is present that affects each student's generalization performance. In the nonzero temperature regime, we find that the coupling of replicas produces a bend of the phase diagram towards smaller values of $\alpha$: This suggests that the free energy landscape gets smoother around the solution with perfect generalization (i.e., the teacher's) at a fixed fraction of examples, allowing standard thermal updates such as Simulated Annealing to easily reach the teacher solution and avoid entrapment in metastable states as it happens in the unreplicated case, even in the so-called computationally easy regime. These results provide additional analytic and numerical evidence for the recently conjectured Bayes-optimal property of Replicated Simulated Annealing (RSA) for a sufficient number of replicas. From a learning perspective, these results also suggest that multiple students working together (in this case reviewing the same data) are able to learn the same rule both significantly faster and with fewer examples, a property that could be exploited in the context of cooperative and federated learning.
翻译:我们刻画了教师-学生场景中 $y$ 个耦合二元感知机在一种学习规则下的均衡性质,该规则包含与学生权重之间的汉明距离成正比的显式铁磁耦合。与近期研究不同,我们分析了一个更一般的设定,其中存在影响每个学生泛化性能的热噪声。在非零温度区间内,我们发现副本的耦合导致相图向 $\alpha$ 较小值弯曲:这表明,在固定样本比例下,自由能景观围绕完美泛化解(即教师解)变得更为平滑,使得诸如模拟退火等标准热更新能够轻松达到教师解,从而避免在无副本情况下出现的亚稳态束缚,即使在所谓的计算易处理区域内也是如此。这些结果为近期猜测的副本模拟退火(RSA)在足够数量的副本下具有贝叶斯最优性质提供了额外的解析和数值证据。从学习角度来看,这些结果还表明,多个学生协同工作(本例中处理相同数据)能够以显著更快的速度和更少的样本学习同一规则,这一特性可在合作学习与联邦学习的背景下加以利用。