Deep learning-based image generation has seen significant advancements with diffusion models, notably improving the quality of generated images. Despite these developments, generating images with unseen characteristics beneficial for downstream tasks has received limited attention. To bridge this gap, we propose Style-Extracting Diffusion Models, featuring two conditioning mechanisms. Specifically, we utilize 1) a style conditioning mechanism which allows to inject style information of previously unseen images during image generation and 2) a content conditioning which can be targeted to a downstream task, e.g., layout for segmentation. We introduce a trainable style encoder to extract style information from images, and an aggregation block that merges style information from multiple style inputs. This architecture enables the generation of images with unseen styles in a zero-shot manner, by leveraging styles from unseen images, resulting in more diverse generations. In this work, we use the image layout as target condition and first show the capability of our method on a natural image dataset as a proof-of-concept. We further demonstrate its versatility in histopathology, where we combine prior knowledge about tissue composition and unannotated data to create diverse synthetic images with known layouts. This allows us to generate additional synthetic data to train a segmentation network in a semi-supervised fashion. We verify the added value of the generated images by showing improved segmentation results and lower performance variability between patients when synthetic images are included during segmentation training. Our code will be made publicly available at [LINK].
翻译:基于深度学习的图像生成技术随着扩散模型的发展取得了显著进展,尤其是在生成图像质量方面。然而,针对生成具有对下游任务有益的新颖特征的图像这一方向,相关研究仍较为有限。为弥补这一不足,我们提出一种基于样式提取的扩散模型,其包含两种条件机制。具体而言,我们采用:1)样式条件机制,允许在图像生成过程中注入未见图像的风格信息;2)内容条件机制,可针对下游任务(如用于分割的布局)进行定向控制。我们引入了一个可训练的样式编码器,用于从图像中提取风格信息,并设计了一个聚合模块,能够融合来自多个样式输入的风格特征。该架构能够以零样本方式利用未见图像的风格生成具有新颖风格的图像,从而提升生成结果的多样性。在本研究中,我们以图像布局作为目标条件,首先在自然图像数据集上验证了方法的可行性。进一步地,我们展示了其在组织病理学领域的通用性:通过结合关于组织成分的先验知识与未标注数据,生成具有已知布局的多样化合成图像。这使我们能够额外生成合成数据,以半监督方式训练分割网络。通过实验证明,在分割训练中纳入合成图像可改善分割结果并降低患者间的性能变异性。我们的代码将在[链接]处公开提供。