How Can We Tame the Long-Tail of Chest X-ray Datasets?

from arxiv, Extended Abstract presented at Computer Vision for Automated Medical Diagnosis Workshop at the International Conference on Computer Vision 2023, October 2nd 2023, Paris, France, & Virtual, https://cvamd2023.github.io, 7 pages

Chest X-rays (CXRs) are a medical imaging modality that is used to infer a large number of abnormalities. While it is hard to define an exhaustive list of these abnormalities, which may co-occur on a chest X-ray, few of them are quite commonly observed and are abundantly represented in CXR datasets used to train deep learning models for automated inference. However, it is challenging for current models to learn independent discriminatory features for labels that are rare but may be of high significance. Prior works focus on the combination of multi-label and long tail problems by introducing novel loss functions or some mechanism of re-sampling or re-weighting the data. Instead, we propose that it is possible to achieve significant performance gains merely by choosing an initialization for a model that is closer to the domain of the target dataset. This method can complement the techniques proposed in existing literature, and can easily be scaled to new labels. Finally, we also examine the veracity of synthetically generated data to augment the tail labels and analyse its contribution to improving model performance.

翻译：胸部X光片（CXR）是一种医学成像模态，用于推断大量异常病变。虽然很难定义这些可能在胸片上共存的异常病变的详尽列表，但其中少数异常较为常见，并在用于训练深度学习模型以进行自动推断的CXR数据集中大量存在。然而，当前模型难以学习那些罕见但可能具有重要临床意义的标签的独立判别特征。现有研究通过引入新型损失函数或采用重采样/重加权机制来解决多标签与长尾问题的组合。与之不同的是，我们提出仅通过选择更接近目标数据集领域的模型初始化，就能实现显著的性能提升。该方法可与现有文献中的技术互补，并易于扩展到新标签。最后，我们还评估了合成数据增强尾部标签的真实性，并分析了其对提升模型性能的贡献。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日