We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artifacts. The training leverages classifier-free guidance to augment a small, sparsely annotated dataset with unlabelled data. Our method alleviates unique challenges in histopathological imaging practice: large-scale information, costly manual annotation, and protective data handling. The biological plausibility of DiffInfinite data is evaluated in a survey by ten experienced pathologists as well as a downstream classification and segmentation task. Samples from the model score strongly on anti-copying metrics which is relevant for the protection of patient data.
翻译:摘要:我们提出DiffInfinite,一种分层扩散模型,能够生成任意大尺寸的组织学图像,同时保留长程相关的结构信息。我们的方法首先生成分割掩膜,随后将其作为高保真生成扩散过程的条件。所提出的采样方法可扩展至任意期望的图像尺寸,且仅需小斑块进行快速训练。此外,该方法比以往的大规模内容生成方法具有更高的并行效率,同时避免了拼接伪影。训练过程利用无分类器指导,通过未标注数据扩充稀疏标注的小型数据集。我们的方法缓解了组织病理学成像实践中的独特挑战:大规模信息、昂贵的人工标注以及保护性数据处理。通过十位资深病理学家的问卷调查以及下游分类与分割任务评估,验证了DiffInfinite数据的生物学合理性。该模型生成的样本在反抄袭指标上表现突出,这对患者数据保护具有重要意义。