Quantile regression is a powerful tool for inferring how covariates affect specific percentiles of the response distribution. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi- or non-parametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Further, neither approach is well-suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model, we derive optimal and interpretable linear estimates and uncertainty quantification for each model-based conditional quantile. Our approach introduces a quantile-focused squared error loss, which enables efficient, closed-form computing and maintains a close relationship with Wasserstein-based density estimation. In an extensive simulation study, our methods demonstrate substantial gains in quantile estimation accuracy, variable selection, and inference over frequentist and Bayesian competitors. We apply these tools to identify the quantile-specific impacts of social and environmental stressors on educational outcomes for a large cohort of children in North Carolina.
翻译:分位数回归是一种推断协变量如何影响响应分布特定百分位数的强大工具。现有方法要么分别为每个关注的分位数估计条件分位数,要么使用半参数或非参数模型估计整个条件分布。前者通常无法为真实数据生成充分模型,且不能在分位数间共享信息;而后者则因模型复杂且受限而难以解释且计算效率低下。此外,这两种方法均不适用于分位数特定的子集选择。为此,我们从贝叶斯决策分析视角提出了线性分位数估计、不确定性量化及子集选择的基本问题。对于任意贝叶斯回归模型,我们为每个基于模型的条件分位数推导出最优且可解释的线性估计与不确定性量化方法。本文方法引入了一种关注分位数的平方误差损失函数,该函数可实现高效的闭式计算,并与基于Wasserstein距离的密度估计保持紧密联系。在一项广泛的模拟研究中,我们的方法在分位数估计精度、变量选择及推断方面均显著优于频率学派和贝叶斯竞争方法。我们应用这些工具来识别社会环境压力因素对北卡罗来纳州一大群儿童教育成果的分位数特定影响。