Modeling Hierarchical Structural Distance for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims to estimate a transferable model for unlabeled target domains by exploiting labeled source data. Optimal Transport (OT) based methods have recently been proven to be a promising solution for UDA with a solid theoretical foundation and competitive performance. However, most of these methods solely focus on domain-level OT alignment by leveraging the geometry of domains for domain-invariant features based on the global embeddings of images. However, global representations of images may destroy image structure, leading to the loss of local details that offer category-discriminative information. This study proposes an end-to-end Deep Hierarchical Optimal Transport method (DeepHOT), which aims to learn both domain-invariant and category-discriminative representations by mining hierarchical structural relations among domains. The main idea is to incorporate a domain-level OT and image-level OT into a unified OT framework, hierarchical optimal transport, to model the underlying geometry in both domain space and image space. In DeepHOT framework, an image-level OT serves as the ground distance metric for the domain-level OT, leading to the hierarchical structural distance. Compared with the ground distance of the conventional domain-level OT, the image-level OT captures structural associations among local regions of images that are beneficial to classification. In this way, DeepHOT, a unified OT framework, not only aligns domains by domain-level OT, but also enhances the discriminative power through image-level OT. Moreover, to overcome the limitation of high computational complexity, we propose a robust and efficient implementation of DeepHOT by approximating origin OT with sliced Wasserstein distance in image-level OT and accomplishing the mini-batch unbalanced domain-level OT.

翻译：无监督域适应（UDA）旨在利用带标签的源数据，为目标域构建一个可迁移模型。基于最优传输（OT）的方法近年来被证明是解决UDA问题的有效方案，兼具扎实的理论基础与竞争性能。然而，大多数现有方法仅通过利用域间几何结构进行域级OT对齐，依赖图像的整体嵌入实现域不变特征。但图像的整体表征可能破坏图像结构，导致丢失蕴含类别判别信息的局部细节。本研究提出一种端到端的深度层次最优传输方法（DeepHOT），旨在通过挖掘域间层次结构关系，同时学习域不变表征与类别判别表征。其核心思想是将域级OT与图像级OT整合至统一的最优传输框架——层次最优传输，以建模域空间与图像空间中的潜在几何结构。在DeepHOT框架中，图像级OT作为域级OT的底层距离度量，构成层次结构距离。相较于传统域级OT的底层距离，图像级OT可捕获图像局部区域间的结构关联，有利于分类任务。通过这种方式，DeepHOT这一统一OT框架不仅通过域级OT实现域对齐，还通过图像级OT增强判别能力。此外，为克服高计算复杂度的限制，我们提出一种鲁棒且高效的DeepHOT实现方案：在图像级OT中使用切片Wasserstein距离近似原始OT，并采用小批量不平衡域级OT完成计算。