Here we present a structural similarity index measure (SSIM) guided conditional Generative Adversarial Network (cGAN) that generatively performs image-to-image (i2i) synthesis to generate photo-accurate protein channels in multiplexed spatial proteomics images. This approach can be utilized to accurately generate missing spatial proteomics channels that were not included during experimental data collection either at the bench or the clinic. Experimental spatial proteomic data from the Human BioMolecular Atlas Program (HuBMAP) was used to generate spatial representations of missing proteins through a U-Net based image synthesis pipeline. HuBMAP channels were hierarchically clustered by the (SSIM) as a heuristic to obtain the minimal set needed to recapitulate the underlying biology represented by the spatial landscape of proteins. We subsequently prove that our SSIM based architecture allows for scaling of generative image synthesis to slides with up to 100 channels, which is better than current state of the art algorithms which are limited to data with 11 channels. We validate these claims by generating a new experimental spatial proteomics data set from human lung adenocarcinoma tissue sections and show that a model trained on HuBMAP can accurately synthesize channels from our new data set. The ability to recapitulate experimental data from sparsely stained multiplexed histological slides containing spatial proteomic will have tremendous impact on medical diagnostics and drug development, and also raises important questions on the medical ethics of utilizing data produced by generative image synthesis in the clinical setting. The algorithm that we present in this paper will allow researchers and clinicians to save time and costs in proteomics based histological staining while also increasing the amount of data that they can generate through their experiments.
翻译:本文提出了一种结构相似性指数测量(SSIM)引导的条件生成对抗网络(cGAN),该网络通过生成式图像到图像(i2i)合成,在多路复用空间蛋白质组图像中生成高保真蛋白通道。该方法可用于准确生成在实验室或临床实验数据采集过程中未包含的缺失空间蛋白质组通道。我们利用人类生物分子图谱计划(HuBMAP)的实验性空间蛋白质组数据,通过基于U-Net的图像合成流程生成缺失蛋白质的空间表征。HuBMAP通道根据SSIM进行层次聚类,作为启发式方法,以获取能够复现由蛋白质空间景观所代表的潜在生物学的最小必要通道集。随后我们证明,基于SSIM的架构可将生成式图像合成扩展至包含多达100个通道的切片,优于当前仅能处理11通道数据的先进算法。我们通过生成来自人肺腺癌组织切片的新实验性空间蛋白质组数据集验证了这些主张,并表明在HuBMAP上训练的模型能够准确合成新数据集的通道。从含有稀疏染色多路复用组织切片的实验数据中复现空间蛋白质组信息的能力,将对医学诊断和药物开发产生巨大影响,同时也引发了关于在临床环境中使用生成式图像合成产生数据的重要医学伦理问题。本文提出的算法将使研究人员和临床医生能够节省基于蛋白质组组织染色的时间和成本,同时增加实验过程中可生成的数据量。