Quantile regression is a powerful tool for inferring how covariates affect specific percentiles of the response distribution. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi- or non-parametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Further, neither approach is well-suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model, we derive optimal and interpretable linear estimates and uncertainty quantification for each model-based conditional quantile. Our approach introduces a quantile-focused squared error loss, which enables efficient, closed-form computing and maintains a close relationship with Wasserstein-based density estimation. In an extensive simulation study, our methods demonstrate substantial gains in quantile estimation accuracy, variable selection, and inference over frequentist and Bayesian competitors. We apply these tools to identify the quantile-specific impacts of social and environmental stressors on educational outcomes for a large cohort of children in North Carolina.
翻译:分位数回归是一种推断协变量如何影响响应分布特定百分位数的强大工具。现有方法要么分别估计每个感兴趣分位数的条件分位数,要么使用半参数或非参数模型估计整个条件分布。前者通常对实际数据产生不充分的模型,且无法跨分位数共享信息,而后者则采用复杂且受约束的模型,难以解释且计算效率低下。此外,这两种方法均不适用于分位数特定的子集选择。为此,我们从贝叶斯决策分析的视角提出线性分位数估计、不确定性量化及子集选择的基本问题。对于任意贝叶斯回归模型,我们推导出每个基于模型的条件分位数的最优且可解释的线性估计及不确定性量化方法。我们的方法引入了一种聚焦分位数的平方误差损失函数,该函数能够实现高效的闭合形式计算,并与基于Wasserstein距离的密度估计保持紧密联系。在广泛的模拟研究中,我们的方法在分位数估计精度、变量选择及推断方面均显著优于频率学派和贝叶斯学派竞争者。我们将这些工具应用于识别北卡罗来纳州一大群儿童的社会与环境压力因素对教育结果的特定分位数影响。