Medical image generation is pivotal in applications like data augmentation for low-resource clinical tasks and privacy-preserving data sharing. However, developing a scalable generative backbone for medical imaging requires architectural efficiency, sufficient multi-organ data, and principled evaluation, yet current approaches leave these aspects unresolved. Therefore, we introduce MedVAR, the first autoregressive-based foundation model that adopts the next-scale prediction paradigm to enable fast and scale-up-friendly medical image synthesis. MedVAR generates images in a coarse-to-fine manner and produces structured multi-scale representations suitable for downstream use. To support hierarchical generation, we curate a harmonized dataset of around 440,000 CT and MRI images spanning six anatomical regions. Comprehensive experiments across fidelity, diversity, and scalability show that MedVAR achieves state-of-the-art generative performance and offers a promising architectural direction for future medical generative foundation models.
翻译:医学图像生成在低资源临床任务的数据增强和隐私保护数据共享等应用中至关重要。然而,开发可扩展的医学影像生成主干网络需要架构高效性、充足的多器官数据以及系统化评估,而现有方法尚未解决这些方面。为此,我们提出了MedVAR——首个基于自回归的基础模型,采用下一尺度预测范式以实现快速且易于扩展的医学图像合成。MedVAR以从粗到细的方式生成图像,并产生适用于下游任务的结构化多尺度表征。为支持分层生成,我们构建了一个包含约44万张CT与MRI图像的协调数据集,涵盖六个解剖区域。在保真度、多样性和可扩展性方面的综合实验表明,MedVAR实现了最先进的生成性能,并为未来医学生成基础模型提供了有前景的架构方向。