Knowledge distillation enables fast and effective transfer of features learned from a bigger model to a smaller one. However, distillation objectives are susceptible to sub-population shifts, a common scenario in medical imaging analysis which refers to groups/domains of data that are underrepresented in the training set. For instance, training models on health data acquired from multiple scanners or hospitals can yield subpar performance for minority groups. In this paper, inspired by distributionally robust optimization (DRO) techniques, we address this shortcoming by proposing a group-aware distillation loss. During optimization, a set of weights is updated based on the per-group losses at a given iteration. This way, our method can dynamically focus on groups that have low performance during training. We empirically validate our method, GroupDistil on two benchmark datasets (natural images and cardiac MRIs) and show consistent improvement in terms of worst-group accuracy.
翻译:知识蒸馏能够将大型模型学习到的特征高效、快速地迁移至小型模型。然而,蒸馏目标容易受到子群体偏移的影响——这是医学影像分析中的常见情境,指训练集中代表性不足的数据组/域。例如,使用从多台扫描仪或多医院获取的健康数据训练模型时,可能导致少数群体性能欠佳。受分布鲁棒优化技术启发,本文通过提出一种群体感知蒸馏损失函数来解决这一缺陷。在优化过程中,基于当前迭代中各群体的损失值更新一组权重。通过这种方式,我们的方法能够动态关注训练过程中性能较差的群体。我们在两个基准数据集(自然图像和心脏磁共振成像)上对GroupDistil方法进行实证验证,结果表明该方法在最差群体准确率方面具有持续改进效果。