Self-paced curriculum learning (SCL) has demonstrated its great potential in computer vision, natural language processing, etc. During training, it implements easy-to-hard sampling based on online estimation of data difficulty. Most SCL methods commonly adopt a loss-based strategy of estimating data difficulty and deweighting the `hard' samples in the early training stage. While achieving success in a variety of applications, SCL stills confront two challenges in a medical image analysis task, such as universal lesion detection, featuring insufficient and highly class-imbalanced data: (i) the loss-based difficulty measurer is inaccurate; ii) the hard samples are under-utilized from a deweighting mechanism. To overcome these challenges, in this paper we propose a novel mixed-order self-paced curriculum learning (Mo-SCL) method. We integrate both uncertainty and loss to better estimate difficulty online and mix both hard and easy samples in the same mini-batch to appropriately alleviate the problem of under-utilization of hard samples. We provide a theoretical investigation of our method in the context of stochastic gradient descent optimization and extensive experiments based on the DeepLesion benchmark dataset for universal lesion detection (ULD). When applied to two state-of-the-art ULD methods, the proposed mixed-order SCL method can provide a free boost to lesion detection accuracy without extra special network designs.
翻译:自定进度课程学习(SCL)在计算机视觉、自然语言处理等领域展现了巨大潜力。在训练过程中,它基于数据难度的在线估计实现由易到难的采样。多数SCL方法通常采用基于损失的策略来估计数据难度,并在训练早期阶段降低“困难”样本的权重。尽管在多种应用中取得了成功,SCL在医学图像分析任务(如通用病灶检测)中仍面临两个挑战——该类任务具有数据不足和高度类别不平衡的特点:(i)基于损失的难度衡量器不准确;(ii)降权机制导致困难样本未得到充分利用。为克服这些挑战,本文提出了一种新颖的混合阶自定进度课程学习(Mo-SCL)方法。我们融合不确定性和损失以更好地在线估计难度,并在同一小批量中混合困难与简单样本,以适当缓解困难样本利用不足的问题。我们从随机梯度下降优化的角度对该方法进行了理论探究,并在用于通用病灶检测(ULD)的DeepLesion基准数据集上开展了大量实验。当应用于两种最先进的ULD方法时,所提出的混合阶SCL方法可在无需额外特殊网络设计的情况下,免费提升病灶检测精度。