Poor performance of quantitative analysis in histopathological Whole Slide Images (WSI) has been a significant obstacle in clinical practice. Annotating large-scale WSIs manually is a demanding and time-consuming task, unlikely to yield the expected results when used for fully supervised learning systems. Rarely observed disease patterns and large differences in object scales are difficult to model through conventional patient intake. Prior methods either fall back to direct disease classification, which only requires learning a few factors per image, or report on average image segmentation performance, which is highly biased towards majority observations. Geometric image augmentation is commonly used to improve robustness for average case predictions and to enrich limited datasets. So far no method provided sampling of a realistic posterior distribution to improve stability, e.g. for the segmentation of imbalanced objects within images. Therefore, we propose a new approach, based on diffusion models, which can enrich an imbalanced dataset with plausible examples from underrepresented groups by conditioning on segmentation maps. Our method can simply expand limited clinical datasets making them suitable to train machine learning pipelines, and provides an interpretable and human-controllable way of generating histopathology images that are indistinguishable from real ones to human experts. We validate our findings on two datasets, one from the public domain and one from a Kidney Transplant study.
翻译:定量分析在组织病理学全切片图像(WSI)中的性能低下,一直是临床实践中的重大障碍。手动标注大规模WSI是一项耗时费力的任务,用于完全监督学习系统时难以产生预期结果。常规患者数据收集中难以对罕见疾病模式和物体尺度的巨大差异进行建模。现有方法要么退回到直接疾病分类(仅需学习每幅图像的少量因素),要么报告平均图像分割性能(这高度偏向于多数观察结果)。几何图像增强通常用于提高平均情况预测的鲁棒性并丰富有限数据集。迄今为止,尚无方法通过采样真实后验分布来提高稳定性,例如针对图像内不平衡物体的分割。因此,我们提出一种基于扩散模型的新方法,通过以分割图为条件,从代表性不足的群体中生成合理解例来丰富不平衡数据集。该方法能简单扩展有限的临床数据集,使其适用于训练机器学习流程,并提供一种可解释且人工可控的生成组织病理学图像方式,生成的图像对人类专家而言与真实图像无法区分。我们在两个数据集上验证了研究结果,一个来自公共领域,另一个来自肾移植研究。