Multi-domain text classification (MDTC) endeavors to harness available resources from correlated domains to enhance the classification accuracy of the target domain. Presently, most MDTC approaches that embrace adversarial training and the shared-private paradigm exhibit cutting-edge performance. Unfortunately, these methods face a non-negligible challenge: the absence of theoretical guarantees in the design of MDTC algorithms. The dearth of theoretical underpinning poses a substantial impediment to the advancement of MDTC algorithms. To tackle this problem, we first provide a theoretical analysis of MDTC by decomposing the MDTC task into multiple domain adaptation tasks. We incorporate the margin discrepancy as the measure of domain divergence and establish a new generalization bound based on Rademacher complexity. Subsequently, we propose a margin discrepancy-based adversarial training (MDAT) approach for MDTC, in accordance with our theoretical analysis. To validate the efficacy of the proposed MDAT method, we conduct empirical studies on two MDTC benchmarks. The experimental results demonstrate that our MDAT approach surpasses state-of-the-art baselines on both datasets.
翻译:多领域文本分类(MDTC)旨在利用相关领域的可用资源,提升目标领域的分类精度。当前,多数采用对抗训练与共享-私有范式的MDTC方法展现出前沿性能。然而,这些方法面临一个不可忽视的挑战:MDTC算法设计中缺乏理论保障。理论基础的匮乏严重阻碍了MDTC算法的发展。为解决此问题,我们首先将MDTC任务分解为多个领域适应任务,从而进行理论分析。我们引入边界差异作为领域散度的度量,并基于Rademacher复杂度建立了新的泛化边界。随后,根据理论分析,我们提出了一种基于边界差异的对抗训练(MDAT)方法用于MDTC。为验证所提MDAT方法的有效性,我们在两个MDTC基准数据集上进行了实证研究。实验结果表明,我们的MDAT方法在两个数据集上均超越了最先进的基线模型。