In modern data analysis, it is common to select a model before performing statistical inference. Selective inference tools make adjustments for the model selection process in order to ensure reliable inference post selection. In this paper, we introduce an asymptotic pivot to infer about the effects of selected variables on conditional quantile functions. Utilizing estimators from smoothed quantile regression, our proposed pivot is easy to compute and yields asymptotically-exact selective inference without making strict distributional assumptions about the response variable. At the core of our pivot is the use of external randomization variables, which allows us to utilize all available samples for both selection and inference without partitioning the data into independent subsets or discarding any samples at any step. From simulation studies, we find that: (i) the asymptotic confidence intervals based on our pivot achieve the desired coverage rates, even in cases where sample splitting fails due to insufficient sample size for inference; (ii) our intervals are consistently shorter than those produced by sample splitting across various models and signal settings. We report similar findings when we apply our approach to study risk factors for low birth weights in a publicly accessible dataset of US birth records from 2022.
翻译:在现代数据分析中,通常会在进行统计推断之前选择一个模型。选择性推断工具对模型选择过程进行调整,以确保选择后推断的可靠性。本文引入了一种渐近枢轴量,用于推断所选变量对条件分位数函数的影响。利用平滑分位数回归的估计量,我们提出的枢轴量易于计算,且无需对响应变量做出严格分布假设即可实现渐近精确的选择性推断。该枢轴量的核心在于使用外部随机化变量,这使得我们能够利用所有可用样本进行选择和推断,而无需将数据分割为独立子集或在任何步骤丢弃样本。通过模拟研究,我们发现:(i) 基于我们枢轴量的渐近置信区间达到了预期的覆盖率,即使在样本量不足导致样本分割方法失效的情况下也是如此;(ii) 在各种模型和信号设置下,我们的区间始终比样本分割方法产生的区间更短。我们将该方法应用于2022年美国出生记录的公开数据集以研究低出生体重的风险因素时,报告了类似的发现。