Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces $\mathcal{V}_s, \mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap \mathcal{V}_w$, while reduced by a factor of $\mathrm{dim}(\mathcal{V}_s)/N$ in the subspace of discrepancy $\mathcal{V}_w \setminus \mathcal{V}_s$ with $N$ pseudo-labels for W2S. Our analysis further casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported by experiments on synthetic regression problems, as well as real vision and NLP tasks.
翻译:弱到强泛化是一种微调方法,其中强(大)学生模型通过弱教师生成的伪标签进行训练。令人惊讶的是,弱到强微调通常能超越弱教师的性能。我们试图通过观察微调通常发生在内在低维空间这一现象来理解这一机制。利用微调的低内在维度特性,我们从方差缩减的角度分析了无岭回归设定下的弱到强泛化。对于一对具有足够表达能力的低维特征子空间$\mathcal{V}_s$和$\mathcal{V}_w$的强学生-弱教师组合,我们精确刻画了主导弱到强泛化误差的方差成分。这揭示了弱到强中强弱模型间差异的一个优点:弱教师的方差在$\mathcal{V}_s \cap \mathcal{V}_w$中被子空间继承,而在差异子空间$\mathcal{V}_w \setminus \mathcal{V}_s$中,该方差被因子$\mathrm{dim}(\mathcal{V}_s)/N$(其中$N$为弱到强使用的伪标签数量)所降低。我们的分析进一步阐明了弱到强泛化的样本复杂度以及性能差距恢复的缩放规律。该分析得到了合成回归问题以及真实视觉和自然语言处理任务的实验支持。