PriorPath: Coarse-To-Fine Approach for Controlled De-Novo Pathology Semantic Masks Generation

Incorporating artificial intelligence (AI) into digital pathology offers promising prospects for automating and enhancing tasks such as image analysis and diagnostic processes. However, the diversity of tissue samples and the necessity for meticulous image labeling often result in biased datasets, constraining the applicability of algorithms trained on them. To harness synthetic histopathological images to cope with this challenge, it is essential not only to produce photorealistic images but also to be able to exert control over the cellular characteristics they depict. Previous studies used methods to generate, from random noise, semantic masks that captured the spatial distribution of the tissue. These masks were then used as a prior for conditional generative approaches to produce photorealistic histopathological images. However, as with many other generative models, this solution exhibits mode collapse as the model fails to capture the full diversity of the underlying data distribution. In this work, we present a pipeline, coined PriorPath, that generates detailed, realistic, semantic masks derived from coarse-grained images delineating tissue regions. This approach enables control over the spatial arrangement of the generated masks and, consequently, the resulting synthetic images. We demonstrated the efficacy of our method across three cancer types, skin, prostate, and lung, showcasing PriorPath's capability to cover the semantic mask space and to provide better similarity to real masks compared to previous methods. Our approach allows for specifying desired tissue distributions and obtaining both photorealistic masks and images within a single platform, thus providing a state-of-the-art, controllable solution for generating histopathological images to facilitate AI for computational pathology.

翻译：将人工智能（AI）融入数字病理学，为图像分析和诊断流程等任务的自动化与增强提供了广阔前景。然而，组织样本的多样性以及对图像进行精细标注的必要性，常常导致数据集存在偏差，从而限制了基于此类数据训练的算法的适用性。为利用合成组织病理学图像应对这一挑战，不仅需要生成具有照片级真实感的图像，还必须能够对其所描绘的细胞特征进行控制。先前的研究采用从随机噪声生成语义掩码的方法，以捕捉组织的空间分布。这些掩码随后被用作条件生成方法的先验，以产生具有照片级真实感的组织病理学图像。然而，与许多其他生成模型一样，该解决方案存在模式崩溃问题，因为模型未能捕捉到底层数据分布的完整多样性。在本工作中，我们提出一个名为PriorPath的流程，该流程从描绘组织区域的粗粒度图像出发，生成详细、逼真的语义掩码。这种方法能够控制生成掩码的空间排列，进而控制最终合成图像的效果。我们在皮肤癌、前列腺癌和肺癌这三种癌症类型上验证了该方法的有效性，展示了PriorPath在覆盖语义掩码空间以及相比先前方法能提供与真实掩码更高相似度方面的能力。我们的方法允许指定期望的组织分布，并在单一平台内获得具有照片级真实感的掩码和图像，从而为生成组织病理学图像提供了一个先进的、可控的解决方案，以促进计算病理学领域的人工智能应用。