Diffusion models have recently gained significant traction due to their ability to generate high-fidelity and diverse images and videos conditioned on text prompts. In medicine, this application promises to address the critical challenge of data scarcity, a consequence of barriers in data sharing, stringent patient privacy regulations, and disparities in patient population and demographics. By generating realistic and varying medical 2D and 3D images, these models offer a rich, privacy-respecting resource for algorithmic training and research. To this end, we introduce MediSyn, a pair of instruction-tuned text-guided latent diffusion models with the ability to generate high-fidelity and diverse medical 2D and 3D images across specialties and modalities. Through established metrics, we show significant improvement in broad medical image and video synthesis guided by text prompts.
翻译:扩散模型近期因其能够根据文本提示生成高保真度且多样化的图像与视频而受到广泛关注。在医学领域,这一应用有望解决数据稀缺这一关键挑战,该挑战源于数据共享壁垒、严格的患者隐私法规以及患者群体与人口统计学差异。通过生成逼真且多样的医学二维与三维图像,这些模型为算法训练与研究提供了丰富且尊重隐私的资源。为此,我们提出了MediSyn——一对经过指令微调的文本引导潜在扩散模型,能够跨医学专科与成像模态生成高保真且多样化的二维与三维医学图像。通过既定指标,我们证明了在文本提示引导下,模型在广泛的医学图像与视频合成方面取得了显著改进。