Multi-domain learning (MDL) refers to simultaneously constructing a model or a set of models on datasets collected from different domains. Conventional approaches emphasize domain-shared information extraction and domain-private information preservation, following the shared-private framework (SP models), which offers significant advantages over single-domain learning. However, the limited availability of annotated data in each domain considerably hinders the effectiveness of conventional supervised MDL approaches in real-world applications. In this paper, we introduce a novel method called multi-domain contrastive learning (MDCL) to alleviate the impact of insufficient annotations by capturing both semantic and structural information from both labeled and unlabeled data.Specifically, MDCL comprises two modules: inter-domain semantic alignment and intra-domain contrast. The former aims to align annotated instances of the same semantic category from distinct domains within a shared hidden space, while the latter focuses on learning a cluster structure of unlabeled instances in a private hidden space for each domain. MDCL is readily compatible with many SP models, requiring no additional model parameters and allowing for end-to-end training. Experimental results across five textual and image multi-domain datasets demonstrate that MDCL brings noticeable improvement over various SP models.Furthermore, MDCL can further be employed in multi-domain active learning (MDAL) to achieve a superior initialization, eventually leading to better overall performance.
翻译:多领域学习(MDL)指同时利用来自不同领域的数据集构建一个或多个模型。传统方法遵循共享-私有框架(SP模型),强调领域共享信息提取与领域私有信息保留,相比单领域学习具有显著优势。然而,在现实应用中,每个领域内标注数据的有限可用性严重制约了传统有监督MDL方法的效果。本文提出一种名为多领域对比学习(MDCL)的新方法,通过从标注与未标注数据中捕获语义与结构信息,缓解标注不足的影响。具体而言,MDCL包含两个模块:跨领域语义对齐与领域内对比。前者旨在将不同领域中相同语义类别的标注实例在共享隐空间中对齐,后者则聚焦于为每个领域在私有隐空间中学习未标注实例的聚类结构。MDCL可便捷地兼容多种SP模型,无需额外模型参数,并支持端到端训练。在五个文本与图像多领域数据集上的实验结果表明,MDCL能显著提升各类SP模型的性能。此外,MDCL还可用于多领域主动学习(MDAL)实现更优的初始化,最终获得更好的整体性能。