Domain adaptation (DA) tackles the issue of distribution shift by learning a model from a source domain that generalizes to a target domain. However, most existing DA methods are designed for scenarios where the source and target domain data lie within the same feature space, which limits their applicability in real-world situations. Recently, heterogeneous DA (HeDA) methods have been introduced to address the challenges posed by heterogeneous feature space between source and target domains. Despite their successes, current HeDA techniques fall short when there is a mismatch in both feature and label spaces. To address this, this paper explores a new DA scenario called open-set HeDA (OSHeDA). In OSHeDA, the model must not only handle heterogeneity in feature space but also identify samples belonging to novel classes. To tackle this challenge, we first develop a novel theoretical framework that constructs learning bounds for prediction error on target domain. Guided by this framework, we propose a new DA method called Representation Learning for OSHeDA (RL-OSHeDA). This method is designed to simultaneously transfer knowledge between heterogeneous data sources and identify novel classes. Experiments across text, image, and clinical data demonstrate the effectiveness of our algorithm. Model implementation is available at \url{https://github.com/pth1993/OSHeDA}.
翻译:域自适应(DA)通过从源域学习一个能够泛化到目标域的模型来解决分布偏移问题。然而,现有的大多数DA方法都是针对源域和目标域数据位于相同特征空间的情景设计的,这限制了它们在现实场景中的适用性。近年来,异构域自适应(HeDA)方法被提出,以应对源域和目标域之间异构特征空间带来的挑战。尽管取得了成功,但当前的HeDA技术在特征空间和标签空间均存在不匹配时表现不足。为解决这一问题,本文探讨了一种称为开放集异构域自适应(OSHeDA)的新DA情景。在OSHeDA中,模型不仅需要处理特征空间的异构性,还必须识别属于新类别的样本。为应对这一挑战,我们首先建立了一个新颖的理论框架,该框架构建了目标域预测误差的学习边界。在此框架的指导下,我们提出了一种新的DA方法,称为面向OSHeDA的表征学习(RL-OSHeDA)。该方法旨在同时实现异构数据源之间的知识迁移和新类别的识别。在文本、图像和临床数据上的实验证明了我们算法的有效性。模型实现可在 \url{https://github.com/pth1993/OSHeDA} 获取。