The burgeoning integration of 3D medical imaging into healthcare has led to a substantial increase in the workload of medical professionals. To assist clinicians in their diagnostic processes and alleviate their workload, the development of a robust system for retrieving similar case studies presents a viable solution. While the concept holds great promise, the field of 3D medical text-image retrieval is currently limited by the absence of robust evaluation benchmarks and curated datasets. To remedy this, our study presents a groundbreaking dataset, BIMCV-R (This dataset will be released upon acceptance.), which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports. Expanding upon the foundational work of our dataset, we craft a retrieval strategy, MedFinder. This approach employs a dual-stream network architecture, harnessing the potential of large language models to advance the field of medical image retrieval beyond existing text-image retrieval solutions. It marks our preliminary step towards developing a system capable of facilitating text-to-image, image-to-text, and keyword-based retrieval tasks.
翻译:三维医学影像在医疗领域的快速集成,导致医疗专业人员的工作量大幅增加。为协助临床医生的诊断流程并减轻其负担,开发一个鲁棒的相似病例检索系统提供了可行的解决方案。尽管这一概念前景广阔,但三维医学文本-图像检索领域目前仍受限于缺乏鲁棒的评估基准和精心整理的数据集。为此,本研究提出了一个开创性数据集BIMCV-R(该数据集将在录用后公开),包含8,069个三维CT体数据(涵盖超过200万张切片)及其对应的放射学报告。基于该数据集的基础性工作,我们设计了一种检索策略MedFinder。该方法采用双流网络架构,利用大型语言模型的潜力将医学图像检索领域提升至现有文本-图像检索方案之上。这标志着我们向构建支持文本到图像、图像到文本及基于关键词检索任务的系统迈出了初步的探索性一步。