Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors. Unlike conventional domain adaptation approaches that focus on regularizing representations, we instead regularize the classifier to align with the unsupervised target data, guided by the LAP in both the source and target domains. Theoretical analysis demonstrates that, under certain assumptions, our solution resides within the span of the top right singular vectors of the target domain data and aligns with the optimal solution. By removing the reliance on the commonly used optimal joint risk assumption found in classic domain adaptation theory, we showcase the effectiveness of our method on addressing problems where traditional domain adaptation methods often fall short due to high joint error. Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis.
翻译:近期研究强调了监督学习中的标签对齐特性(LAP),即数据集中所有标签构成的向量主要位于数据矩阵前几个奇异向量的张成空间中。受此观察启发,我们提出一种用于无监督域适应的正则化方法,该方法鼓励目标域中的预测与其前几个奇异向量对齐。与专注于正则化表示的传统域适应方法不同,我们转而基于源域和目标域中的LAP指导,对分类器进行正则化以使其与无监督目标数据对齐。理论分析表明,在特定假设下,我们的解位于目标域数据的前几个右奇异向量的张成空间中,并与最优解对齐。通过摒弃经典域适应理论中常用的最优联合风险假设,我们展示了该方法在解决传统域适应方法因高联合误差而常常失效的问题上的有效性。此外,我们在MNIST-USPS域适应和跨语言情感分析等知名任务中报告了优于域适应基线的性能提升。