Curriculum learning couples two design choices, how samples are scored by difficulty and how harder samples are paced into training, making it difficult to attribute observed gains to either component. We disentangle these factors with two evaluation protocols: stage-wise test subsets that validate scoring functions independently of curriculum training, and a baseline that applies the same pacing schedule to randomly ordered data. Within the Transfer Teacher framework (TTF), we use these protocols to evaluate a confusion-aware difficulty score that considers both correct-class confidence and the probability distribution over incorrect classes. On CIFAR-10 with ResNet-18 and VGG-16, the proposed score produces model-interpretable difficulty rankings that align with human intuition. However, at full data, neither curriculum nor anti-curriculum ordering improves accuracy over standard training, indicating that improving the scoring function alone is insufficient to overcome the known failure modes of curriculum learning in TTF. In contrast, We find that confusion-aware curriculum ordering result in consistent data-efficiency benefits, outperforming random ordering by up to 8.7% points at the 20% data regime, suggesting the potential of TTF as a data-efficient training method.
翻译:摘要:课程学习将两个设计选择耦合在一起:即如何根据难度对样本进行评分,以及如何将更难的样本逐步引入训练,这使得难以将观察到的性能提升归因于任一组件。我们通过两种评估协议解耦这些因素:阶段式测试子集(独立于课程训练验证评分函数),以及一种基线方法(对随机排序的数据应用相同的节奏调度)。在传递教师框架(TTF)中,我们利用这些协议评估一种困惑感知难度评分,该评分同时考虑正确类别的置信度以及错误类别的概率分布。在CIFAR-10数据集上使用ResNet-18和VGG-16网络时,所提出的评分能够生成模型可解释的难度排序结果,且与人类直觉一致。然而,在全量数据条件下,无论是课程排序还是反课程排序均未比标准训练提升准确率,这表明仅改进评分函数不足以克服TTF中课程学习已知的失效模式。相反,我们发现困惑感知课程排序在数据效率方面具有持续优势,在20%数据量场景下相比随机排序提升高达8.7个百分点,揭示了TTF作为数据高效训练方法的潜力。