PETAR：基于掩码感知视觉语言建模的PET自动化报告局部发现生成 (PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting)

Danyal Maqbool,Changhee Lee,Zachary Huemann,Samuel D. Church,Matthew E. Larson,Scott B. Perlman,Tomas A. Romero,Joshua D. Warner,Meghan Lubner,Xin Tie,Jameson Merkow,Junjie Hu,Steve Y. Cho,Tyler J. Bradshaw

Recent advances in vision-language models (VLMs) have enabled impressive multimodal reasoning, yet most medical applications remain limited to 2D imaging. In this work, we extend VLMs to 3D positron emission tomography and computed tomography (PET/CT), a domain characterized by large volumetric data, small and dispersed lesions, and lengthy radiology reports. We introduce a large-scale dataset comprising over 11,000 lesion-level descriptions paired with 3D segmentations from more than 5,000 PET/CT exams, extracted via a hybrid rule-based and large language model (LLM) pipeline. Building upon this dataset, we propose PETAR-4B, a 3D mask-aware vision-language model that integrates PET, CT, and lesion contours for spatially grounded report generation. PETAR bridges global contextual reasoning with fine-grained lesion awareness, producing clinically coherent and localized findings. Comprehensive automated and human evaluations demonstrate that PETAR substantially improves PET/CT report generation quality, advancing 3D medical vision-language understanding.

翻译：近期视觉语言模型（VLMs）的进展已实现卓越的多模态推理能力，但大多数医学应用仍局限于二维成像。本研究将VLMs扩展至三维正电子发射断层扫描与计算机断层扫描（PET/CT）领域，该领域具有大体积数据、小而分散的病灶以及冗长放射学报告的特点。我们引入了一个大规模数据集，包含超过11,000个病灶级描述与来自5,000余次PET/CT检查的三维分割结果，这些数据通过基于规则的混合流程与大型语言模型（LLM）流水线提取。基于此数据集，我们提出PETAR-4B——一个三维掩码感知视觉语言模型，它整合了PET、CT及病灶轮廓信息，实现空间锚定的报告生成。PETAR通过融合全局上下文推理与细粒度病灶感知，生成临床连贯且定位明确的发现。综合自动化与人工评估表明，PETAR显著提升了PET/CT报告生成质量，推动了三维医学视觉语言理解的发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日