The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an intra-source style augmentation (ISSA) method to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The model learns to faithfully reconstruct the image preserving its semantic layout through noise prediction. Random masking of the estimated noise enables the style mixing capability of our model, i.e. it allows to alter the global appearance without affecting the semantic layout of an image. Using the proposed masked noise encoder to randomize style and content combinations in the training set, ISSA effectively increases the diversity of training data and reduces spurious correlation. As a result, we achieve up to $12.4\%$ mIoU improvements on driving-scene semantic segmentation under different types of data shifts, i.e., changing geographic locations, adverse weather conditions, and day to night. ISSA is model-agnostic and straightforwardly applicable with CNNs and Transformers. It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $3\%$ mIoU in Cityscapes to Dark Z\"urich.
翻译:针对深度学习模型在应用场景(如自动驾驶)中频繁出现的域偏移泛化难题,我们提出了一种源内风格增强(ISSA)方法,用于改进语义分割中的域泛化能力。本方法基于一种新颖的掩码噪声编码器实现StyleGAN2反演,该模型通过噪声预测忠实重建图像并保留其语义布局。对估计噪声进行随机掩码处理,使模型获得风格混合能力——即在不影响图像语义布局的前提下改变全局外观。通过使用所提出的掩码噪声编码器随机化训练集中的风格与内容组合,ISSA有效提升了训练数据多样性并减少了虚假相关性。实验表明,在驾驶场景语义分割任务中,面对地理区域变化、恶劣天气条件、昼夜转换等不同类型数据偏移时,本方法平均交并比(mIoU)提升高达12.4%。ISSA具有模型无关性,可直接应用于CNN和Transformer架构,同时与其他域泛化技术互补——例如,在Cityscapes到Dark Zürich的迁移任务中,该方案可将近期最优方法RobustNet的mIoU提升3%。