In this paper, we introduce a kNN-based regression method that synergizes the scalability and adaptability of traditional non-parametric kNN models with a novel variable selection technique. This method focuses on accurately estimating the conditional mean and variance of random response variables, thereby effectively characterizing conditional distributions across diverse scenarios.Our approach incorporates a robust uncertainty quantification mechanism, leveraging our prior estimation work on conditional mean and variance. The employment of kNN ensures scalable computational efficiency in predicting intervals and statistical accuracy in line with optimal non-parametric rates. Additionally, we introduce a new kNN semi-parametric algorithm for estimating ROC curves, accounting for covariates. For selecting the smoothing parameter k, we propose an algorithm with theoretical guarantees.Incorporation of variable selection enhances the performance of the method significantly over conventional kNN techniques in various modeling tasks. We validate the approach through simulations in low, moderate, and high-dimensional covariate spaces. The algorithm's effectiveness is particularly notable in biomedical applications as demonstrated in two case studies. Concluding with a theoretical analysis, we highlight the consistency and convergence rate of our method over traditional kNN models, particularly when the underlying regression model takes values in a low-dimensional space.
翻译:本文提出一种基于kNN的回归方法,该方法将传统非参数kNN模型的可扩展性与适应性同新型变量选择技术相结合。该方法聚焦于精确估计随机响应变量的条件均值与方差,从而有效刻画不同场景下的条件分布。本方法融合了稳健的不确定性量化机制,充分利用了我们先前在条件均值与方差估计方面的研究成果。kNN的应用确保了预测区间计算的可扩展计算效率,并实现了与最优非参数速率一致的统计精度。此外,我们提出一种新的kNN半参数算法用于估计考虑协变量的ROC曲线。针对平滑参数k的选择,我们给出具有理论保证的算法。引入变量选择显著提升了该方法在各类建模任务中优于传统kNN技术的性能。我们通过在低维、中维和高维协变量空间中的模拟验证了该方法的有效性。两个案例研究揭示了该算法在生物医学应用中的显著效果。最后,通过理论分析,我们强调了该方法相较于传统kNN模型的一致性和收敛速率优势,尤其当底层回归模型取值于低维空间时。