We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts. The training leverages classifier-free guidance to augment a small, sparsely annotated dataset with unlabelled data. Our method alleviates unique challenges in histopathological imaging practice: large-scale information, costly manual annotation, and protective data handling. The biological plausibility of DiffInfinite data is validated in a survey by ten experienced pathologists as well as a downstream segmentation task. Furthermore, the model scores strongly on anti-copying metrics which is beneficial for the protection of patient data.
翻译:摘要:我们提出DiffInfinite,一种分层扩散模型,能够在保留长程关联结构信息的同时生成任意大小的组织学图像。该方法首先生成合成分割掩膜,随后将其用作高保真度生成扩散过程的条件。所提出的采样方法可扩展至任意所需图像尺寸,且仅需小斑块即可实现快速训练。此外,该方法能比先前的超大内容生成方法更高效地并行化,同时避免拼接伪影。训练过程采用无分类器引导技术,利用未标记数据扩充标注稀疏的小型数据集。我们的方法缓解了组织病理学成像实践中的特有挑战:大规模信息、昂贵的人工标注及保护性的数据处理。通过十位资深病理学家的调查以及下游分割任务的验证,DiffInfinite生成的生物学合理性得到证实。此外,该模型在反拷贝指标上表现优异,有利于患者数据保护。