Quantile regression, a robust method for estimating conditional quantiles, has advanced significantly in fields such as econometrics, statistics, and machine learning. In high-dimensional settings, where the number of covariates exceeds sample size, penalized methods like lasso have been developed to address sparsity challenges. Bayesian methods, initially connected to quantile regression via the asymmetric Laplace likelihood, have also evolved, though issues with posterior variance have led to new approaches, including pseudo/score likelihoods. This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction. It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation. The method demonstrates strong theoretical guarantees, through PAC-Bayes bounds, that establish non-asymptotic oracle inequalities, showing minimax-optimal prediction error and adaptability to unknown sparsity. Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.
翻译:分位数回归作为一种估计条件分位数的稳健方法,在计量经济学、统计学和机器学习等领域取得了显著进展。在高维设定下,即协变量数量超过样本量的情形,诸如lasso之类的惩罚方法已被开发用于应对稀疏性挑战。贝叶斯方法最初通过非对称拉普拉斯似然与分位数回归建立联系,其后亦不断发展,尽管后验方差问题催生了包括伪似然/得分似然在内的新方法。本文提出了一种用于高维分位数预测的新型概率机器学习方法。该方法采用伪贝叶斯框架,结合尺度化的Student-t先验和朗之万蒙特卡洛方法以实现高效计算。通过PAC-贝叶斯界,该方法展现出坚实的理论保证,建立了非渐近的oracle不等式,证明了其具有极小极大最优预测误差以及对未知稀疏性的自适应能力。通过仿真实验和真实数据验证了其有效性,在与成熟的频率学派及贝叶斯技术的比较中表现出竞争优势。