MedVAR: Towards Scalable and Efficient Medical Image Generation via Next-scale Autoregressive Prediction

Medical image generation is pivotal in applications like data augmentation for low-resource clinical tasks and privacy-preserving data sharing. However, developing a scalable generative backbone for medical imaging requires architectural efficiency, sufficient multi-organ data, and principled evaluation, yet current approaches leave these aspects unresolved. Therefore, we introduce MedVAR, the first autoregressive-based foundation model that adopts the next-scale prediction paradigm to enable fast and scale-up-friendly medical image synthesis. MedVAR generates images in a coarse-to-fine manner and produces structured multi-scale representations suitable for downstream use. To support hierarchical generation, we curate a harmonized dataset of around 440,000 CT and MRI images spanning six anatomical regions. Comprehensive experiments across fidelity, diversity, and scalability show that MedVAR achieves state-of-the-art generative performance and offers a promising architectural direction for future medical generative foundation models.

翻译：医学图像生成在低资源临床任务的数据增强和隐私保护数据共享等应用中至关重要。然而，开发可扩展的医学影像生成主干网络需要架构高效性、充足的多器官数据以及系统化评估，而现有方法尚未解决这些方面。为此，我们提出了MedVAR——首个基于自回归的基础模型，采用下一尺度预测范式以实现快速且易于扩展的医学图像合成。MedVAR以从粗到细的方式生成图像，并产生适用于下游任务的结构化多尺度表征。为支持分层生成，我们构建了一个包含约44万张CT与MRI图像的协调数据集，涵盖六个解剖区域。在保真度、多样性和可扩展性方面的综合实验表明，MedVAR实现了最先进的生成性能，并为未来医学生成基础模型提供了有前景的架构方向。

相关内容

医学图像

关注 84

医学影像是指为了医疗或医学研究，对人体或人体某部分，以非侵入方式取得内部组织影像的技术与处理过程。它包含以下两个相对独立的研究方向：医学成像系统（medical imaging system）和医学图像处理（medical image processing）。前者是指图像行成的过程，包括对成像机理、成像设备、成像系统分析等问题的研究；后者是指对已经获得的图像作进一步的处理，其目的是或者是使原来不够清晰的图像复原，或者是为了突出图像中的某些特征信息，或者是对图像做模式分类等等。

《Med3DVLM：面向三维医学图像分析的高效视觉-语言模型》

专知会员服务

9+阅读 · 2025年3月27日

【CVPR2025】通过高效提示与偏好优化增强SAM，实现半监督医学图像分割

专知会员服务

11+阅读 · 2025年3月8日

扩撒模型如何做医学图像？MICCAI2023最新《扩散模型医学图像分析》综述，134页PPT全面阐述医学图像扩散模型方法体系

专知会员服务

42+阅读 · 2023年10月10日

【斯坦福博士论文】面向医学图像分析的标签高效机器学习，214页pdf

专知会员服务

57+阅读 · 2023年6月9日