Artificial intelligence (AI) can automatically delineate lesions on computed tomography (CT) and generate radiology report content, yet progress is limited by the scarcity of publicly available CT datasets with lesion-level annotations. To bridge this gap, we introduce CT-Bench, a first-of-its-kind benchmark dataset comprising two components: a Lesion Image and Metadata Set containing 20,335 lesions from 7,795 CT studies with bounding boxes, descriptions, and size information, and a multitask visual question answering benchmark with 2,850 QA pairs covering lesion localization, description, size estimation, and attribute categorization. Hard negative examples are included to reflect real-world diagnostic challenges. We evaluate multiple state-of-the-art multimodal models, including vision-language and medical CLIP variants, by comparing their performance to radiologist assessments, demonstrating the value of CT-Bench as a comprehensive benchmark for lesion analysis. Moreover, fine-tuning models on the Lesion Image and Metadata Set yields significant performance gains across both components, underscoring the clinical utility of CT-Bench.
翻译:人工智能(AI)能够自动在计算机断层扫描(CT)图像上勾画病灶并生成放射学报告内容,但进展受到公开可用的、具有病灶级标注的CT数据集稀缺的限制。为弥补这一差距,我们推出了CT-Bench,这是一个首创的基准数据集,包含两个组成部分:一个包含来自7,795项CT研究的20,335个病灶的“病灶图像与元数据集”,提供边界框、描述和尺寸信息;以及一个包含2,850个问答对的多任务视觉问答基准,涵盖病灶定位、描述、尺寸估计和属性分类。该数据集包含了反映真实世界诊断挑战的困难负例。我们评估了多种最先进的多模态模型,包括视觉语言模型和医学CLIP变体,通过将其性能与放射科医师的评估结果进行比较,证明了CT-Bench作为病灶分析综合基准的价值。此外,在“病灶图像与元数据集”上对模型进行微调,可在两个组成部分上均带来显著的性能提升,这凸显了CT-Bench的临床实用性。