Multiscale Vector-Quantized Variational Autoencoder for Endoscopic Image Synthesis

Gastrointestinal (GI) imaging via Wireless Capsule Endoscopy (WCE) generates a large number of images requiring manual screening. Deep learning-based Clinical Decision Support (CDS) systems can assist screening, yet their performance relies on the existence of large, diverse, training medical datasets. However, the scarcity of such data, due to privacy constraints and annotation costs, hinders CDS development. Generative machine learning offers a viable solution to combat this limitation. While current Synthetic Data Generation (SDG) methods, such as Generative Adversarial Networks and Variational Autoencoders have been explored, they often face challenges with training stability and capturing sufficient visual diversity, especially when synthesizing abnormal findings. This work introduces a novel VAE-based methodology for medical image synthesis and presents its application for the generation of WCE images. The novel contributions of this work include a) multiscale extension of the Vector Quantized VAE model, named as Multiscale Vector Quantized Variational Autoencoder (MSVQ-VAE); b) unlike other VAE-based SDG models for WCE image generation, MSVQ-VAE is used to seamlessly introduce abnormalities into normal WCE images; c) it enables conditional generation of synthetic images, enabling the introduction of different types of abnormalities into the normal WCE images; d) it performs experiments with a variety of abnormality types, including polyps, vascular and inflammatory conditions. The utility of the generated images for CDS is assessed via image classification. Comparative experiments demonstrate that training a CDS classifier using the abnormal images generated by the proposed methodology yield comparable results with a classifier trained with only real data. The generality of the proposed methodology promises its applicability to various domains related to medical multimedia.

翻译：通过无线胶囊内窥镜（WCE）进行的胃肠道（GI）成像会产生大量需要人工筛查的图像。基于深度学习的临床决策支持（CDS）系统可以辅助筛查，但其性能依赖于大规模、多样化训练医学数据集的存在。然而，由于隐私限制和标注成本，此类数据的稀缺阻碍了CDS的发展。生成式机器学习为克服这一限制提供了可行的解决方案。虽然当前的合成数据生成（SDG）方法，如生成对抗网络和变分自编码器已被探索，但它们通常在训练稳定性和捕获足够视觉多样性方面面临挑战，尤其是在合成异常发现时。本研究引入了一种基于VAE的医学图像合成新方法，并展示了其在生成WCE图像中的应用。本工作的新颖贡献包括：a) 对向量量化VAE模型进行多尺度扩展，命名为多尺度向量量化变分自编码器（MSVQ-VAE）；b) 与其他基于VAE的WCE图像生成SDG模型不同，MSVQ-VAE用于将异常无缝引入正常WCE图像；c) 它支持合成图像的条件生成，从而能够在正常WCE图像中引入不同类型的异常；d) 它对多种异常类型进行了实验，包括息肉、血管性和炎症性病变。通过图像分类评估了生成图像对CDS的效用。对比实验表明，使用所提方法生成的异常图像训练CDS分类器，与仅使用真实数据训练的分类器取得了可比的结果。所提方法的通用性预示着其可应用于与医学多媒体相关的各个领域。