In this paper, we propose an invariant quantile regression (IQR) framework specifically designed for multi-environment datasets, which captures the invariance across different environments. This model is closely related to transfer learning, causal inference, and fair machine learning, and is motivated by scenarios in which the conditional probability of the response given covariates varies, while certain key features remain invariant. This perspective differs notably from previous works that restrict attention to the conditional mean, which is often insufficient in heterogeneous environments and the resulting estimators can become sensitive to ``bad" environments or changes in noise distributional shape. In contrast, quantile-based invariance naturally accommodates heterogeneity, and aligns more closely with structural causal models, in which variables invariant across environments at one or multiple quantile levels naturally indicate potential and stable causal predictors. Moreover, the set of endogenous variables under the IQR framework can be larger than that under the conditional mean framework typically, which in turn promotes more effective exclusion of spurious (no-causal) predictors provided that endogenous variables are not incorporated. To achieve this, we introduce a Kernel-Smoothed Focused Invariance Quantile Regression (KSFIQR) estimator, which leverages the underlying invariance structure and heterogeneity among environments, ensuring stable estimation across multiple environments. We establish the causal discovery properties of our method, demonstrate its ability to overcome the ``curse of endogeneity", and derive an $\ell_2$ error bound for our estimator in the low-dimensional regime, all in a non-asymptotic framework. From an algorithmic perspective, we implement the L-BFGS-B method and the Gumbel trick, with our numerical studies validating the proposed approach.
翻译:本文提出了一种专为多环境数据集设计的不变分位数回归框架,该框架能够捕捉不同环境间的不变性。该模型与迁移学习、因果推断和公平机器学习密切相关,其动机源于响应变量在给定协变量条件下的概率分布会发生变化,而某些关键特征保持不变的场景。这一视角与以往仅关注条件均值的研究存在显著差异:在异质环境中,条件均值往往不够充分,且由此产生的估计量可能对“不良”环境或噪声分布形态的变化非常敏感。相比之下,基于分位数的不变性能够自然适应异质性,并与结构因果模型更加一致——在该模型中,在一个或多个分位数水平上跨环境保持不变的变量自然指示了潜在且稳定的因果预测因子。此外,在IQR框架下,内生变量的集合通常比条件均值框架下的更大,这反过来有助于更有效地剔除虚假(非因果)预测因子(前提是未纳入内生变量)。为实现这一目标,我们引入了一种核平滑聚焦不变分位数回归(KSFIQR)估计量,该估计量利用了潜在的不变性结构和环境间的异质性,确保跨多个环境的稳定估计。我们在非渐近框架下建立了该方法的因果发现性质,证明了其克服“内生性诅咒”的能力,并推导了低维情形下估计量的$\ell_2$误差界。从算法角度,我们实现了L-BFGS-B方法和Gumbel技巧,数值研究验证了所提出方法的有效性。