Invariant quantile regression for heterogeneous environments

In this paper, we propose an invariant quantile regression (IQR) framework specifically designed for multi-environment datasets, which captures the invariance across different environments. This model is closely related to transfer learning, causal inference, and fair machine learning, and is motivated by scenarios in which the conditional probability of the response given covariates varies, while certain key features remain invariant. This perspective differs notably from previous works that restrict attention to the conditional mean, which is often insufficient in heterogeneous environments and the resulting estimators can become sensitive to ``bad" environments or changes in noise distributional shape. In contrast, quantile-based invariance naturally accommodates heterogeneity, and aligns more closely with structural causal models, in which variables invariant across environments at one or multiple quantile levels naturally indicate potential and stable causal predictors. Moreover, the set of endogenous variables under the IQR framework can be larger than that under the conditional mean framework typically, which in turn promotes more effective exclusion of spurious (no-causal) predictors provided that endogenous variables are not incorporated. To achieve this, we introduce a Kernel-Smoothed Focused Invariance Quantile Regression (KSFIQR) estimator, which leverages the underlying invariance structure and heterogeneity among environments, ensuring stable estimation across multiple environments. We establish the causal discovery properties of our method, demonstrate its ability to overcome the ``curse of endogeneity", and derive an $\ell_2$ error bound for our estimator in the low-dimensional regime, all in a non-asymptotic framework. From an algorithmic perspective, we implement the L-BFGS-B method and the Gumbel trick, with our numerical studies validating the proposed approach.

翻译：本文提出了一种专为多环境数据集设计的不变分位数回归（IQR）框架，该框架能够捕捉不同环境间的恒定性。该模型与迁移学习、因果推断和公平机器学习紧密相关，其应用动机源于响应变量在给定协变量条件下的条件概率发生变化，而某些关键特征保持不变的场景。这一视角显著区别于以往局限于条件均值的研究——后者在异质环境中往往不足，且其估计量易受"不良"环境或噪声分布形态变化的影响。相比之下，基于分位数的不变性能够自然适应异质性，并与结构因果模型更为契合：在多环境中保持不变的单一或多个分位数水平对应的变量，天然指示着潜在且稳定的因果预测因子。此外，IQR框架下内生变量集通常大于条件均值框架下的变量集，这反而有助于在未纳入内生变量的情况下更有效地排除虚假（非因果）预测因子。为实现上述目标，我们引入核平滑聚焦不变分位数回归（KSFIQR）估计量，该估计量通过利用底层不变性结构和环境间的异质性，确保跨多环境的稳定估计。我们建立了该方法在因果发现方面的性质，证明了其克服"内生性诅咒"的能力，并在低维场景下推导出估计量的$\ell_2$误差界，所有结论均基于非渐近框架。从算法角度，我们实现了L-BFGS-B方法和Gumbel技巧，数值实验验证了所提方法的有效性。