To overcome the domain gap between synthetic and real-world datasets, unsupervised domain adaptation methods have been proposed for semantic segmentation. Majority of the previous approaches have attempted to reduce the gap either at the pixel or feature level, disregarding the fact that the two components interact positively. To address this, we present CONtrastive FEaTure and pIxel alignment (CONFETI) for bridging the domain gap at both the pixel and feature levels using a unique contrastive formulation. We introduce well-estimated prototypes by including category-wise cross-domain information to link the two alignments: the pixel-level alignment is achieved using the jointly trained style transfer module with the prototypical semantic consistency, while the feature-level alignment is enforced to cross-domain features with the \textbf{pixel-to-prototype contrast}. Our extensive experiments demonstrate that our method outperforms existing state-of-the-art methods using DeepLabV2. Our code is available at https://github.com/cxa9264/CONFETI
翻译:为了克服合成数据集与真实数据集之间的域差异,无监督域自适应方法已被提出用于语义分割。以往的大多数方法尝试在像素级或特征级减少这种差异,却忽略了这两个组成部分之间存在积极交互。针对这一问题,我们提出了一种名为CONtrastive FEaTure and pIxel alignment(CONFETI)的方法,通过独特的对比公式在像素级和特征级同时弥合域差异。我们引入了通过包含类别级跨域信息而良好估计的原型,以连接这两个对齐过程:像素级对齐通过联合训练的风格迁移模块与原型语义一致性实现,而特征级对齐则通过\textbf{像素到原型对比}对跨域特征加以约束。大量实验表明,我们的方法在使用DeepLabV2时优于现有最先进方法。我们的代码可从https://github.com/cxa9264/CONFETI获取。