MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning

Self-supervised learning (SSL) is potentially useful in reducing the need for manual annotation and making deep learning models accessible for medical image analysis tasks. By leveraging the representations learned from unlabeled data, self-supervised models perform well on tasks that require little to no fine-tuning. However, for medical images, like chest X-rays, which are characterized by complex anatomical structures and diverse clinical conditions, there arises a need for representation learning techniques that can encode fine-grained details while preserving the broader contextual information. In this context, we introduce MLVICX (Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning), an approach to capture rich representations in the form of embeddings from chest X-ray images. Central to our approach is a novel multi-level variance and covariance exploration strategy that empowers the model to detect diagnostically meaningful patterns while reducing redundancy effectively. By enhancing the variance and covariance of the learned embeddings, MLVICX promotes the retention of critical medical insights by adapting both global and local contextual details. We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning through comprehensive experiments. The performance enhancements we observe across various downstream tasks highlight the significance of the proposed approach in enhancing the utility of chest X-ray embeddings for precision medical diagnosis and comprehensive image analysis. For pertaining, we used the NIH-Chest X-ray dataset, while for downstream tasks, we utilized NIH-Chest X-ray, Vinbig-CXR, RSNA pneumonia, and SIIM-ACR Pneumothorax datasets. Overall, we observe more than 3% performance gains over SOTA SSL approaches in various downstream tasks.

翻译：自监督学习（SSL）在减少人工标注需求和推动深度学习模型应用于医学图像分析任务方面具有潜在价值。通过利用无标记数据中习得的表示，自监督模型在仅需少量或无需微调的任务中表现优异。然而，对于胸部X光片这类以复杂解剖结构和多样化临床条件为特征的医学图像，需要能够编码细粒度细节同时保留整体上下文信息的表示学习技术。在此背景下，我们提出MLVICX（多层级方差-协方差探索的胸部X光自监督表示学习）方法，旨在从胸部X光图像中捕获以嵌入形式呈现的丰富表示。该方法的核心是一种新颖的多层级方差与协方差探索策略，使模型能够有效检测具有诊断意义的模式，同时减少冗余信息。通过增强所学嵌入的方差和协方差，MLVICX通过适配全局与局部上下文细节，促进关键医学信息的保留。我们通过全面的实验验证了MLVICX在推进胸部X光自监督表示学习方面的性能。在多种下游任务中观察到的性能提升，凸显了该方法在增强胸部X光嵌入对精准医学诊断和全面图像分析实用性方面的重要价值。预训练阶段我们使用了NIH-胸部X光数据集，下游任务则采用NIH-胸部X光、Vinbig-CXR、RSNA肺炎及SIIM-ACR气胸数据集。总体而言，我们在多项下游任务中相较于现有最优SSL方法实现了超过3%的性能提升。