Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors. Unlike conventional domain adaptation approaches that focus on regularizing representations, we instead regularize the classifier to align with the unsupervised target data, guided by the LAP in both the source and target domains. Theoretical analysis demonstrates that, under certain assumptions, our solution resides within the span of the top right singular vectors of the target domain data and aligns with the optimal solution. By removing the reliance on the commonly used optimal joint risk assumption found in classic domain adaptation theory, we showcase the effectiveness of our method on addressing problems where traditional domain adaptation methods often fall short due to high joint error. Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis.
翻译:近期研究强调了监督学习中的标签对齐性质(LAP),即数据集中所有标签的向量主要位于数据矩阵前几个奇异向量的张成空间中。受此启发,我们提出一种无监督域适应正则化方法,鼓励目标域中的预测结果与其前奇异向量对齐。与聚焦于正则化表征的传统域适应方法不同,我们正则化分类器以与无监督目标数据对齐,并受源域与目标域中LAP的引导。理论分析表明,在特定假设下,我们的解位于目标域数据右奇异向量张成空间内,并与最优解对齐。通过消除经典域适应理论中常用的最优联合风险假设依赖,我们证明了该方法在传统域适应方法因联合误差过高而失效的问题上的有效性。此外,我们在MNIST-USPS域适应和跨语言情感分析等经典任务中,报告了相比域适应基线的性能提升。