Deep Learning became an ubiquitous paradigm due to its extraordinary effectiveness and applicability in numerous domains. However, the approach suffers from the high demand of data required to achieve the potential of this type of model. An ever-increasing sub-field of Artificial Intelligence, Image Synthesis, aims to address this limitation through the design of intelligent models capable of creating original and realistic images, endeavour which could drastically reduce the need for real data. The Stable Diffusion generation paradigm recently propelled state-of-the-art approaches to exceed all previous benchmarks. In this work, we propose the ContRail framework based on the novel Stable Diffusion model ControlNet, which we empower through a multi-modal conditioning method. We experiment with the task of synthetic railway image generation, where we improve the performance in rail-specific tasks, such as rail semantic segmentation by enriching the dataset with realistic synthetic images.
翻译:深度学习因其卓越的有效性和在众多领域的广泛适用性,已成为一种无处不在的范式。然而,该方法面临一个主要挑战:实现此类模型潜力所需的数据量极高。图像合成作为人工智能中一个日益增长的子领域,旨在通过设计能够生成原创且逼真图像的智能模型来解决这一限制,此举有望大幅减少对真实数据的需求。Stable Diffusion生成范式最近推动了最先进方法超越所有先前基准。在本工作中,我们提出了基于新颖Stable Diffusion模型ControlNet的ContRail框架,并通过一种多模态条件处理方法对其进行了增强。我们在合成铁路图像生成任务上进行了实验,通过用逼真的合成图像丰富数据集,提升了在铁路特定任务(如铁路语义分割)上的性能。