It is well known that it is impossible to construct useful confidence intervals (CIs) about the mean or median of a response $Y$ conditional on features $X = x$ without making strong assumptions about the joint distribution of $X$ and $Y$. This paper introduces a new framework for reasoning about problems of this kind by casting the conditional problem at different levels of resolution, ranging from coarse to fine localization. In each of these problems, we consider local quantiles defined as the marginal quantiles of $Y$ when $(X,Y)$ is resampled in such a way that samples $X$ near $x$ are up-weighted while the conditional distribution $Y \mid X$ does not change. We then introduce the Weighted Quantile method, which asymptotically produces the uniformly most accurate confidence intervals for these local quantiles no matter the (unknown) underlying distribution. Another method, namely, the Quantile Rejection method, achieves finite sample validity under no assumption whatsoever. We conduct extensive numerical studies demonstrating that both of these methods are valid. In particular, we show that the Weighted Quantile procedure achieves nominal coverage as soon as the effective sample size is in the range of 10 to 20.
翻译:众所周知,在不对联合分布$X$和$Y$做出强假设的情况下,构建关于条件于特征$X=x$的响应变量$Y$的均值或中位数的有用置信区间是不可能的。本文提出了一种新的框架来处理此类问题,通过在不同分辨率水平(从粗粒度到细粒度局部化)上考虑条件问题。在每个问题中,我们考虑局部分位数,定义为当$(X,Y)$经过重采样(使得接近$x$的样本$X$被加权,而条件分布$Y \mid X$不变)时$Y$的边缘分位数。随后,我们引入了加权分位数方法,该方法渐近地产生这些局部分位数的一致最精确置信区间,无论(未知的)底层分布如何。另一种方法,即分位数拒绝方法,在没有任何假设的情况下实现了有限样本的有效性。我们进行了广泛的数值研究,证明这两种方法都是有效的。特别地,我们表明,当有效样本量在10到20范围内时,加权分位数程序即可达到名义覆盖水平。