The landscape of deep learning research is moving towards innovative strategies to harness the true potential of data. Traditionally, emphasis has been on scaling model architectures, resulting in large and complex neural networks, which can be difficult to train with limited computational resources. However, independently of the model size, data quality (i.e. amount and variability) is still a major factor that affects model generalization. In this work, we propose a novel technique to exploit available data through the use of automatic data augmentation for the tasks of image classification and semantic segmentation. We introduce the first Differentiable Augmentation Search method (DAS) to generate variations of images that can be processed as videos. Compared to previous approaches, DAS is extremely fast and flexible, allowing the search on very large search spaces in less than a GPU day. Our intuition is that the increased receptive field in the temporal dimension provided by DAS could lead to benefits also to the spatial receptive field. More specifically, we leverage DAS to guide the reshaping of the spatial receptive field by selecting task-dependant transformations. As a result, compared to standard augmentation alternatives, we improve in terms of accuracy on ImageNet, Cifar10, Cifar100, Tiny-ImageNet, Pascal-VOC-2012 and CityScapes datasets when plugging-in our DAS over different light-weight video backbones.
翻译:深度学习研究的趋势正朝着充分利用数据潜力的创新策略发展。传统上,重点在于扩展模型架构,从而产生庞大且复杂的神经网络,这可能在有限计算资源下难以训练。然而,无论模型规模如何,数据质量(即数量和多样性)仍然是影响模型泛化能力的主要因素。在这项工作中,我们提出了一种新颖的技术,通过自动数据增强来利用可用数据,用于图像分类和语义分割任务。我们引入了第一种可微增强搜索方法(DAS),以生成可被处理为视频的图像变体。与以往方法相比,DAS速度极快且灵活,允许在不到一个GPU天的时间内搜索非常大的搜索空间。我们的直觉是,DAS提供的时域感受野增强可能对空间感受野也有益处。更具体地说,我们利用DAS通过选择任务相关变换来指导空间感受野的重塑。结果表明,与标准增强替代方案相比,当我们将DAS插件式应用于不同的轻量级视频骨干网络时,在ImageNet、Cifar10、Cifar100、Tiny-ImageNet、Pascal-VOC-2012和CityScapes数据集上的准确率均得到提升。