In real medical data, training samples typically show long-tailed distributions with multiple labels. Class distribution of the medical data has a long-tailed shape, in which the incidence of different diseases is quite varied, and at the same time, it is not unusual for images taken from symptomatic patients to be multi-label diseases. Therefore, in this paper, we concurrently address these two issues by putting forth a robust asymmetric loss on the polynomial function. Since our loss tackles both long-tailed and multi-label classification problems simultaneously, it leads to a complex design of the loss function with a large number of hyper-parameters. Although a model can be highly fine-tuned due to a large number of hyper-parameters, it is difficult to optimize all hyper-parameters at the same time, and there might be a risk of overfitting a model. Therefore, we regularize the loss function using the Hill loss approach, which is beneficial to be less sensitive against the numerous hyper-parameters so that it reduces the risk of overfitting the model. For this reason, the proposed loss is a generic method that can be applied to most medical image classification tasks and does not make the training process more time-consuming. We demonstrate that the proposed robust asymmetric loss performs favorably against the long-tailed with multi-label medical image classification in addition to the various long-tailed single-label datasets. Notably, our method achieves Top-5 results on the CXR-LT dataset of the ICCV CVAMD 2023 competition. We opensource our implementation of the robust asymmetric loss in the public repository: https://github.com/kalelpark/RAL.
翻译:在真实医疗数据中,训练样本通常呈现多标签的长尾分布。医疗数据的类别分布具有长尾形状,其中不同疾病的发病率差异显著,同时,从有症状患者身上获取的图像往往包含多标签疾病。因此,本文通过提出一种基于多项式函数的鲁棒非对称损失,同时解决这两个问题。由于我们的损失函数同时处理长尾和多标签分类问题,这导致损失函数设计复杂且包含大量超参数。尽管大量超参数可以使模型高度微调,但难以同时优化所有超参数,且可能存在过拟合模型的风险。因此,我们采用Hill损失方法对损失函数进行正则化,这有助于降低对众多超参数的敏感性,从而减少模型过拟合的风险。为此,所提出的损失函数是一种通用方法,可适用于大多数医学图像分类任务,且不会使训练过程更加耗时。我们证明,所提出的鲁棒非对称损失不仅在多标签长尾医学图像分类中表现优异,在多种长尾单标签数据集上也同样出色。值得注意的是,我们的方法在ICCV CVAMD 2023竞赛的CXR-LT数据集上取得了前五名的成绩。我们已在公共仓库中开源了鲁棒非对称损失的实现:https://github.com/kalelpark/RAL。