Expert annotation of 3D medical image for downstream analysis is resource-intensive, posing challenges in clinical applications. Visual self-supervised learning (vSSL), though effective for learning visual invariance, neglects the incorporation of domain knowledge from medicine. To incorporate medical knowledge into visual representation learning, vision-language pre-training (VLP) has shown promising results in 2D image. However, existing VLP approaches become generally impractical when applied to high-resolution 3D medical images due to GPU hardware constraints and the potential loss of critical details caused by downsampling, which is the intuitive solution to hardware constraints. To address the above limitations, we introduce T3D, the first VLP framework designed for high-resolution 3D medical images. T3D incorporates two text-informed pretext tasks: (\lowerromannumeral{1}) text-informed contrastive learning; (\lowerromannumeral{2}) text-informed image restoration. These tasks focus on learning 3D visual representations from high-resolution 3D medical images and integrating clinical knowledge from radiology reports, without distorting information through forced alignment of downsampled volumes with detailed anatomical text. Trained on a newly curated large-scale dataset of 3D medical images and radiology reports, T3D significantly outperforms current vSSL methods in tasks like organ and tumor segmentation, as well as disease classification. This underlines T3D's potential in representation learning for 3D medical image analysis. All data and code will be available upon acceptance.
翻译:对三维医学图像进行专家标注以支持下游分析需要大量资源,这给临床应用带来了挑战。视觉自监督学习(vSSL)虽能有效学习视觉不变性,但忽视了医学领域知识的融合。视觉-语言预训练(VLP)通过将医学知识融入视觉表征学习,已在二维图像中展现出潜力。然而,现有VLP方法在处理高分辨率三维医学图像时通常不可行,原因包括GPU硬件限制以及下采样(硬件限制的直观解决方案)可能导致关键细节丢失。为克服上述局限,我们提出T3D——首个面向高分辨率三维医学图像的VLP框架。T3D包含两种文本引导的预训练任务:(ⅰ)文本引导的对比学习;(ⅱ)文本引导的图像重建。这些任务旨在从高分辨率三维医学图像中学习三维视觉表征,并整合放射学报告中的临床知识,避免通过将下采样体素与详细解剖文本强行对齐而导致信息失真。基于新整理的大规模三维医学图像与放射学报告数据集进行训练后,T3D在器官与肿瘤分割及疾病分类等任务上显著优于当前vSSL方法。这凸显了T3D在三维医学图像分析表征学习中的潜力。所有数据与代码将在论文接收后公开。