Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data source. While Generative Adversarial Networks (GANs) have provided innovative approaches for histopathological image analysis, they suffer from limitations such as mode collapse and overfitting in discriminator. Recently, Denoising Diffusion models have demonstrated promising results in computer vision. These models exhibit superior stability during training, better distribution coverage, and produce high-quality diverse images. Additionally, they display a high degree of resilience to noise and perturbations, making them well-suited for use in digital pathology, where images commonly contain artifacts and exhibit significant variations in staining. In this paper, we present a novel approach, namely ViT-DAE, which integrates vision transformers (ViT) and diffusion autoencoders for high-quality histopathology image synthesis. This marks the first time that ViT has been introduced to diffusion autoencoders in computational pathology, allowing the model to better capture the complex and intricate details of histopathology images. We demonstrate the effectiveness of ViT-DAE on three publicly available datasets. Our approach outperforms recent GAN-based and vanilla DAE methods in generating realistic images.
翻译:生成式人工智能因能够合成与原始数据源高度相似的数据,近年来受到广泛关注。尽管生成对抗网络(GANs)为组织病理学图像分析提供了创新方法,但其存在模式崩塌和判别器过拟合等局限性。近期,去噪扩散模型在计算机视觉领域展现出令人瞩目的成果。这类模型在训练过程中具有更优的稳定性、更广的分布覆盖率,并能生成高质量且多样化的图像。此外,它们对噪声和扰动表现出高度鲁棒性,特别适用于数字病理学领域——该领域的图像常存在伪影且染色差异显著。本文提出一种名为ViT-DAE的新方法,该方法融合了视觉Transformer(ViT)与扩散自编码器,用于高质量组织病理学图像合成。这是计算病理学领域首次将ViT引入扩散自编码器,使模型能更精准地捕捉组织病理学图像中复杂精细的细节特征。我们在三个公开数据集上验证了ViT-DAE的有效性。与近期基于GAN和原始DAE的方法相比,本方法在生成逼真图像方面具有更优表现。