Beyond class frequency, we recognize the impact of class-wise relationships among various class-specific predictions and the imbalance in label masks on long-tailed segmentation learning. To address these challenges, we propose an innovative Pixel-wise Adaptive Training (PAT) technique tailored for long-tailed segmentation. PAT has two key features: 1) class-wise gradient magnitude homogenization, and 2) pixel-wise class-specific loss adaptation (PCLA). First, the class-wise gradient magnitude homogenization helps alleviate the imbalance among label masks by ensuring equal consideration of the class-wise impact on model updates. Second, PCLA tackles the detrimental impact of both rare classes within the long-tailed distribution and inaccurate predictions from previous training stages by encouraging learning classes with low prediction confidence and guarding against forgetting classes with high confidence. This combined approach fosters robust learning while preventing the model from forgetting previously learned knowledge. PAT exhibits significant performance improvements, surpassing the current state-of-the-art by 2.2% in the NyU dataset. Moreover, it enhances overall pixel-wise accuracy by 2.85% and intersection over union value by 2.07%, with a particularly notable declination of 0.39% in detecting rare classes compared to Balance Logits Variation, as demonstrated on the three popular datasets, i.e., OxfordPetIII, CityScape, and NYU.
翻译:超越类别频率,我们认识到各类别特定预测间的类间关系以及标签掩码不平衡对长尾分割学习的显著影响。针对这些挑战,我们提出一种创新的长尾分割专用像素级自适应训练(PAT)技术。PAT具有两大核心特性:1)类别级梯度幅值均一化,以及2)像素级类别特异性损失自适应(PCLA)。首先,类别级梯度幅值均一化通过确保模型更新过程中各类别影响力的平等考量,有效缓解标签掩码间的不平衡性。其次,PCLA通过双管齐下的策略应对长尾分布中稀有类别的负面影响及先前训练阶段不准确预测造成的干扰:一方面激励低置信度预测类别的主动学习,另一方面抑制高置信度类别的遗忘风险。这种组合方法在防止模型遗忘先前习得知识的同时,实现了鲁棒学习。实验表明,PAT在NyU数据集上相较当前最优方法取得2.2%的性能跃升。在OxfordPetIII、CityScape、NYU三个主流数据集上的评估显示,该方法使整体像素级准确率提升2.85%,交并比指标提高2.07%,尤其在稀有类别检测方面相比平衡对数差异法(Balance Logits Variation)实现了0.39%的显著下降。