Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain Generalization

The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an exemplar-based style synthesis pipeline to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The model learns to faithfully reconstruct the image, preserving its semantic layout through noise prediction. Using the proposed masked noise encoder to randomize style and content combinations in the training set, i.e., intra-source style augmentation (ISSA) effectively increases the diversity of training data and reduces spurious correlation. As a result, we achieve up to $12.4\%$ mIoU improvements on driving-scene semantic segmentation under different types of data shifts, i.e., changing geographic locations, adverse weather conditions, and day to night. ISSA is model-agnostic and straightforwardly applicable with CNNs and Transformers. It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $3\%$ mIoU in Cityscapes to Dark Z\"urich. In addition, we demonstrate the strong plug-n-play ability of the proposed style synthesis pipeline, which is readily usable for extra-source exemplars e.g., web-crawled images, without any retraining or fine-tuning. Moreover, we study a new use case to indicate neural network's generalization capability by building a stylized proxy validation set. This application has significant practical sense for selecting models to be deployed in the open-world environment. Our code is available at \url{https://github.com/boschresearch/ISSA}.

翻译：域偏移（例如自动驾驶应用中常见的情况）下的泛化能力仍是深度学习模型面临的主要挑战之一。为此，我们提出了一种基于样例的风格合成流程，以改进语义分割中的域泛化能力。该方法基于一种新颖的用于StyleGAN2反演的掩码噪声编码器。该模型通过学习噪声预测，能够忠实地重建图像并保留其语义布局。利用所提出的掩码噪声编码器随机化训练集中的风格与内容组合，即源内风格增强（ISSA），能有效增加训练数据的多样性并减少虚假相关性。实验结果表明，在不同类型的数据偏移下（例如地理位置变化、恶劣天气条件以及昼夜转换），我们在驾驶场景语义分割任务中实现了高达12.4%的平均交并比（mIoU）提升。ISSA具有模型无关性，可直接应用于CNN和Transformer架构，并能与其他域泛化技术互补——例如，在Cityscapes到Dark Zürich的迁移任务中，它使当前最优方法RobustNet的mIoU提升了3%。此外，我们展示了所提风格合成流程强大的即插即用能力：无需重新训练或微调，即可直接使用源外样例（如网络爬取图像）。更重要的是，我们探索了一种新应用场景：通过构建风格化的代理验证集来评估神经网络的泛化能力。该方法对在实际开放环境中选择待部署模型具有重要的实践意义。代码已开源：\url{https://github.com/boschresearch/ISSA}。