Statistical inference of the high-dimensional regression coefficients is challenging because the uncertainty introduced by the model selection procedure is hard to account for. A critical question remains unsettled; that is, is it possible and how to embed the inference of the model into the simultaneous inference of the coefficients? To this end, we propose a notion of simultaneous confidence intervals called the sparsified simultaneous confidence intervals. Our intervals are sparse in the sense that some of the intervals' upper and lower bounds are shrunken to zero (i.e., $[0,0]$), indicating the unimportance of the corresponding covariates. These covariates should be excluded from the final model. The rest of the intervals, either containing zero (e.g., $[-1,1]$ or $[0,1]$) or not containing zero (e.g., $[2,3]$), indicate the plausible and significant covariates, respectively. The proposed method can be coupled with various selection procedures, making it ideal for comparing their uncertainty. For the proposed method, we establish desirable asymptotic properties, develop intuitive graphical tools for visualization, and justify its superior performance through simulation and real data analysis.
翻译:高维回归系数的统计推断具有挑战性,因为模型选择过程引入的不确定性难以量化。一个关键问题仍未解决:即是否可能以及如何将模型推断嵌入到系数的同时推断中?为此,我们提出了一种称为稀疏化同时置信区间的概念。我们的区间具有稀疏性,即部分区间的上下界被压缩至零(即$[0,0]$),表明对应协变量不重要。这些协变量应从最终模型中排除。其余区间中,包含零的区间(如$[-1,1]$或$[0,1]$)表示可能的协变量,而不包含零的区间(如$[2,3]$)表示显著的协变量。所提出的方法可与多种选择过程结合使用,因此非常适合比较其不确定性。针对该方法,我们建立了理想的渐近性质,开发了直观的可视化图形工具,并通过模拟和实际数据分析验证了其优越性能。