The most successful multi-domain text classification (MDTC) approaches employ the shared-private paradigm to facilitate the enhancement of domain-invariant features through domain-specific attributes. Additionally, they employ adversarial training to align marginal feature distributions. Nevertheless, these methodologies encounter two primary challenges: (1) Neglecting class-aware information during adversarial alignment poses a risk of misalignment; (2) The limited availability of labeled data across multiple domains fails to ensure adequate discriminative capacity for the model. To tackle these issues, we propose a method called Regularized Conditional Alignment (RCA) to align the joint distributions of domains and classes, thus matching features within the same category and amplifying the discriminative qualities of acquired features. Moreover, we employ entropy minimization and virtual adversarial training to constrain the uncertainty of predictions pertaining to unlabeled data and enhance the model's robustness. Empirical results on two benchmark datasets demonstrate that our RCA approach outperforms state-of-the-art MDTC techniques.
翻译:最成功的多领域文本分类方法采用共享-私有范式,通过领域特定属性促进领域不变特征的增强。此外,这些方法利用对抗训练来对齐边缘特征分布。然而,这些方法面临两大挑战:(1)在对抗对齐过程中忽略类别感知信息可能导致误对齐;(2)多领域中标记数据的有限可用性无法确保模型足够的判别能力。为解决这些问题,我们提出了一种称为正则化条件对齐的方法,用于对齐领域和类别的联合分布,从而匹配同一类别内的特征并增强所获取特征的判别性。此外,我们采用熵最小化和虚拟对抗训练来约束未标注数据预测的不确定性,并提升模型的鲁棒性。在两个基准数据集上的实验结果表明,我们的RCA方法优于现有的多领域文本分类技术。