Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models, which focus on improving the pseudo-labeling techniques or proposing new training objectives. Instead, we aim to analyze the fundamental limits of MSFDA. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model, which illustrates an inherent bias-variance trade-off. We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment, which leads to the design of novel algorithms. Experiments on multiple datasets validate our theoretical analysis and demonstrate the state-of-art performance of the proposed algorithm, especially on some of the most challenging datasets, including Office-Home and DomainNet.
翻译:由于隐私、存储及其他约束,机器学习中需要一种无需访问源模型训练数据的无监督域适配技术。现有的多源无数据域适配(MSFDA)方法通常利用源模型生成的伪标签数据训练目标模型,重点在于改进伪标签技术或提出新的训练目标。本文旨在分析MSFDA的基本限制。具体而言,我们推导了目标模型泛化误差的信息论上界,揭示了内在的偏差-方差权衡。进而从域聚合、选择性伪标签及联合特征对齐三个角度提出平衡该权衡的见解,并据此设计新算法。多数据集实验验证了理论分析,并表明所提算法在包括Office-Home及DomainNet等极具挑战性的数据集上达到了最先进性能。