This paper studies quantile regression in a data-limited setting where the gold-standard outcome is available only for a limited number of observations, whereas a surrogate outcome is widely available. Such settings are becoming increasingly common with the availability of low-cost predictions from modern AI, motivating a growing line of research on "prediction-powered inference," for improved statistical inference. Naively extending this framework to quantile regression, however, raises two challenges: computational difficulties due to the discontinuity of the subgradient, and overly conservative confidence intervals. To address these issues, we propose a convolution-based smoothing of the check-loss objective and develop two variants of the estimator. The proposed estimators are computationally tractable, and our numerical studies show that they mitigate overcoverage. As a theoretical contribution, we establish the asymptotic distributions of the proposed estimators under a possibly misspecified linear quantile regression model. We further propose an ensemble of the two estimators and illustrate the proposed methods through simulations and an application to a local housing dataset.
翻译:本文研究了在数据有限场景下的分位数回归问题,其中黄金标准结果仅对少量观测样本可用,而替代结果则广泛可得。随着现代人工智能低成本预测的普及,此类场景日益常见,催生了旨在改进统计推断的"预测驱动推断"研究方向。然而,将这一框架直接推广至分位数回归面临两大挑战:次梯度不连续性导致的计算困难,以及置信区间过度保守的问题。针对上述问题,我们提出了一种基于卷积平滑的“检查损失”目标函数优化方法,并开发了两种估计量变体。所提估计量具有计算可行性,数值实验表明其能够缓解过度覆盖问题。在理论贡献方面,我们建立了在可能误设的线性分位数回归模型下所提估计量的渐近分布。进一步地,我们提出了两种估计量的集成方法,并通过模拟实验及本地住房数据集的应用验证了所提方法的有效性。