Unsupervised Domain Adaptation (UDA) is a popular technique that aims to reduce the domain shift between two data distributions. It was successfully applied in computer vision and natural language processing. In the current work, we explore the effects of various unsupervised domain adaptation techniques between two text classification tasks: fake and hyperpartisan news detection. We investigate the knowledge transfer from fake to hyperpartisan news detection without involving target labels during training. Thus, we evaluate UDA, cluster alignment with a teacher, and cross-domain contrastive learning. Extensive experiments show that these techniques improve performance, while including data augmentation further enhances the results. In addition, we combine clustering and topic modeling algorithms with UDA, resulting in improved performances compared to the initial UDA setup.
翻译:无监督领域适应(UDA)是一种旨在减少两个数据分布之间领域偏移的流行技术,已成功应用于计算机视觉和自然语言处理。在本研究中,我们探索了多种无监督领域适应技术在两个文本分类任务(虚假新闻检测与极端党派新闻检测)之间的效果。我们研究了在训练过程中不涉及目标标签的情况下,从虚假新闻检测向极端党派新闻检测的知识迁移。具体而言,我们评估了UDA、基于教师模型的聚类对齐以及跨领域对比学习。大量实验表明,这些技术能提升性能,而引入数据增强可进一步改进结果。此外,我们将聚类与主题建模算法同UDA相结合,相较于初始UDA设置,性能得到显著提升。