In many important statistical analyses, the number of covariates $p$ often exceeds the data size $n$, a regime commonly referred to as high-dimensional. While considerable progress has been made in high-dimensional regression under the assumption of error-free covariates, real-world data frequently involve noisy or corrupted measurements. When left unaddressed, measurement errors can silently distort the analysis and mislead the conclusions. This paper reviews and evaluates some advisable statistical inference methods for high-dimensional regression in the presence of mismeasured covariates. We discuss four penalized regression methods -- ridge, lasso, Dantzig selector, and Elastic-net -- alongside their measurement-error-corrected variants, and conduct a comparative study under linear additive and uncorrelated measurement error models. Through simulation studies and a real application to high-dimensional medical genetic data, we illustrate the methods studied, show that the choice of correction procedure is problem-specific, and provide practical recommendations to help practitioners navigate this methodological landscape.
翻译:在许多重要的统计分析中,协变量个数$p$往往超过数据样本量$n$,这种情形通常被称为高维问题。尽管在假设协变量无误差的条件下,高维回归已取得显著进展,但实际数据中经常包含含噪或受损的测量结果。若不加处理,测量误差会悄然扭曲分析结论并导致误导性推断。本文回顾并评估了在存在测量误差协变量情形下,若干适用于高维回归的统计推断方法。我们讨论了四种带惩罚的回归方法——岭回归、Lasso、Dantzig选择器和弹性网络——及其测量误差校正变体,并在线性可加且不相关的测量误差模型下进行了比较研究。通过模拟实验以及一项针对高维医学遗传数据的实际应用,我们展示了所研究的方法,表明校正程序的选择具有问题特异性,并为实践者在该方法论体系中提供实用建议。