Unsupervised Domain Adaptation (UDA) and domain generalization (DG) are two research areas that aim to tackle the lack of generalization of Deep Neural Networks (DNNs) towards unseen domains. While UDA methods have access to unlabeled target images, domain generalization does not involve any target data and only learns generalized features from a source domain. Image-style randomization or augmentation is a popular approach to improve network generalization without access to the target domain. Complex methods are often proposed that disregard the potential of simple image augmentations for out-of-domain generalization. For this reason, we systematically study the in- and out-of-domain generalization capabilities of simple, rule-based image augmentations like blur, noise, color jitter and many more. Based on a full factorial design of experiment design we provide a systematic statistical evaluation of augmentations and their interactions. Our analysis provides both, expected and unexpected, outcomes. Expected, because our experiments confirm the common scientific standard that combination of multiple different augmentations out-performs single augmentations. Unexpected, because combined augmentations perform competitive to state-of-the-art domain generalization approaches, while being significantly simpler and without training overhead. On the challenging synthetic-to-real domain shift between Synthia and Cityscapes we reach 39.5% mIoU compared to 40.9% mIoU of the best previous work. When additionally employing the recent vision transformer architecture DAFormer we outperform these benchmarks with a performance of 44.2% mIoU
翻译:无监督域适应(UDA)和域泛化(DG)是旨在解决深度神经网络(DNN)对未知域泛化能力不足的两个研究领域。UDA方法可访问未标注的目标图像,而域泛化则不涉及任何目标数据,仅从源域学习泛化特征。图像风格随机化或增强是一种无需访问目标域即可提升网络泛化能力的常用方法。现有研究常提出复杂方法,却忽视了简单图像增强在跨域泛化中的潜力。为此,我们系统研究了基于规则的简单图像增强(如模糊、噪声、颜色抖动等)在域内和跨域泛化中的能力。基于全因子实验设计,我们提供了增强方法及其交互作用的系统性统计分析。分析结果既包含预期发现,也包含意外结果。预期发现:实验证实了多个不同增强方法的组合优于单一增强方法这一普遍科学结论。意外发现:组合增强方法的性能可与最先进的域泛化方法相媲美,同时显著更简单且无需额外训练开销。在从Synthia到Cityscapes的具有挑战性的合成域到真实域迁移任务中,我们达到了39.5%的mIoU,而此前最优结果为40.9%的mIoU。当进一步采用最新的视觉Transformer架构DAFormer时,我们以44.2%的mIoU超越了这些基准性能。