Data augmentation is one of the most common tools in deep learning, underpinning many recent advances including tasks such as classification, detection, and semantic segmentation. The standard approach to data augmentation involves simple transformations like rotation and flipping to generate new images. However, these new images often lack diversity along the main semantic dimensions within the data. Traditional data augmentation methods cannot alter high-level semantic attributes such as the presence of vehicles, trees, and buildings in a scene to enhance data diversity. In recent years, the rapid development of generative models has injected new vitality into the field of data augmentation. In this paper, we address the lack of diversity in data augmentation for road detection task by using a pre-trained text-to-image diffusion model to parameterize image-to-image transformations. Our method involves editing images using these diffusion models to change their semantics. In essence, we achieve this goal by erasing instances of real objects from the original dataset and generating new instances with similar semantics in the erased regions using the diffusion model, thereby expanding the original dataset. We evaluate our approach on the KITTI road dataset and achieve the best results compared to other data augmentation methods, which demonstrates the effectiveness of our proposed development.
翻译:数据增强是深度学习中最常用的工具之一,支撑着分类、检测和语义分割等任务领域的诸多最新进展。标准数据增强方法通常采用旋转和翻转等简单变换来生成新图像。然而,这些新图像往往在数据的主要语义维度上缺乏多样性。传统数据增强方法无法改变场景中车辆、树木和建筑物等高级语义属性以提升数据多样性。近年来,生成模型的快速发展为数据增强领域注入了新的活力。本文针对道路检测任务中数据增强多样性不足的问题,采用预训练的文本到图像扩散模型对图像到图像的变换进行参数化建模。我们的方法利用这些扩散模型编辑图像以改变其语义内容。本质上,我们通过从原始数据集中擦除真实物体实例,并利用扩散模型在擦除区域生成具有相似语义的新实例来实现这一目标,从而扩展原始数据集。我们在KITTI道路数据集上评估了所提方法,相较于其他数据增强方法取得了最佳结果,这证明了我们提出的开发方案的有效性。