How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?

Machine learning models often degrade when they are deployed on a target distribution that differs from the source distributions they were trained on. Recent work in causality-based domain generalization has shown how shared causal structure between domains can induce invariant predictors, e.g., models on a subset of features which have stable risk across structured domain shifts. However, the extent to which such population-level causal invariances can lead to gains in finite-sample settings remains underexplored. In particular, in practice we often have access to a few labeled target samples, a setting called supervised domain adaptation (sDA). In this paper, we explore when (full or partial) causal knowledge can provably improve supervised domain adaptation. As a first step, we study linear regression, where full or partial causal knowledge specifies a collection of invariant or possibly invariant feature subsets, each yielding a source-trained candidate predictor. We derive matching upper and lower bounds showing that finite-sample gains are governed by the target-risk margins separating the candidates, together with the finite-source estimation error. When these margins are sufficiently large relative to $n_Q$, an adaptive aggregation procedure can match the best candidate predictor while avoiding negative transfer relative to target-only learning. On the other hand, when the margins are too small, no algorithm can reliably exploit the candidate collection to obtain faster finite-sample rates. We further connect these margins to structural shift magnitude in linear SCMs and validate the theory on real-world causal benchmarks.

翻译：机器学习模型在部署到与训练数据分布不同的目标分布时，性能往往会下降。近期基于因果性的领域泛化研究表明，领域间的共享因果结构可以诱导出不变量预测器，例如，利用特征子集构建的模型在结构化领域偏移下具有稳定的风险。然而，这种总体层面的因果不变性在有限样本设置中能带来多大增益仍待深入探究。实践中我们常能获取少量带标签的目标样本，此设定称为监督领域自适应。本文探索（完全或部分）因果知识在何种条件下能可证明地提升监督领域自适应。首先以线性回归为例，完全或部分因果知识指定了一组不变量或可能不变量特征子集，每个子集对应一个基于源域训练的候选预测器。我们推导出匹配的上下界，表明有限样本增益受候选预测器间目标风险裕度与有限源域估计误差的共同制约。当这些裕度相对于$n_Q$足够大时，自适应聚合过程可匹配最优候选预测器，同时避免相对纯目标学习的负迁移。反之，当裕度过小时，任何算法都无法可靠利用候选集合获得更快的有限样本收敛速率。我们进一步将这些裕度与线性结构因果模型中的结构偏移幅度建立联系，并在真实因果基准数据集上验证理论。