It is well known that it is impossible to construct useful confidence intervals (CIs) about the mean or median of a response $Y$ conditional on features $X = x$ without making strong assumptions about the joint distribution of $X$ and $Y$. This paper introduces a new framework for reasoning about problems of this kind by casting the conditional problem at different levels of resolution, ranging from coarse to fine localization. In each of these problems, we consider local quantiles defined as the marginal quantiles of $Y$ when $(X,Y)$ is resampled in such a way that samples $X$ near $x$ are up-weighted while the conditional distribution $Y \mid X$ does not change. We then introduce the Weighted Quantile method, which asymptotically produces the uniformly most accurate confidence intervals for these local quantiles no matter the (unknown) underlying distribution. Another method, namely, the Quantile Rejection method, achieves finite sample validity under no assumption whatsoever. We conduct extensive numerical studies demonstrating that both of these methods are valid. In particular, we show that the Weighted Quantile procedure achieves nominal coverage as soon as the effective sample size is in the range of 10 to 20.
翻译:众所周知,在不对$X$和$Y$的联合分布做出强假设的情况下,无法构建关于响应变量$Y$在给定特征$X=x$条件下的均值或中位数的有效置信区间。本文提出了一种处理此类问题的新框架,通过在不同分辨率水平(从粗粒度到细粒度定位)上重构条件问题来展开分析。在每个问题中,我们考虑通过重新采样$(X,Y)$定义的局部分位数——当样本$X$接近$x$时赋予更高权重,同时保持条件分布$Y \mid X$不变,此时$Y$的边缘分位数即为局部分位数。我们进一步提出了加权分位数方法,该方法在渐近意义上能对任意(未知)底层分布下的局部分位数产生一致最优精度的置信区间。另一种方法——分位数拒绝方法,则在无任何假设条件下实现了有限样本有效性。我们通过大量数值研究表明,这两种方法均具有有效性。特别地,加权分位数方法在有效样本量达到10至20时即可实现名义覆盖水平。