This paper presents a novel method to make statistical inferences for both the model support and regression coefficients in a high-dimensional logistic regression model. Our method is based on the repro samples framework, in which we conduct statistical inference by generating artificial samples mimicking the actual data-generating process. The proposed method has two major advantages. Firstly, for model support, we introduce the first method for constructing model confidence set in a high-dimensional setting and the proposed method only requires a weak signal strength assumption. Secondly, in terms of regression coefficients, we establish confidence sets for any group of linear combinations of regression coefficients. Our simulation results demonstrate that the proposed method produces valid and small model confidence sets and achieves better coverage for regression coefficients than the state-of-the-art debiasing methods. Additionally, we analyze single-cell RNA-seq data on the immune response. Besides identifying genes previously proved as relevant in the literature, our method also discovers a significant gene that has not been studied before, revealing a potential new direction in understanding cellular immune response mechanisms.
翻译:本文提出了一种新颖方法,用于对高维逻辑回归模型中的模型支持与回归系数进行统计推断。该方法基于再抽样框架,通过生成模拟实际数据生成过程的人工样本来开展统计推断。所提方法具有两大优势:首先,在模型支持方面,我们首次提出了在高维场景中构建模型置信集的方法,且该方法仅需弱信号强度假设。其次,在回归系数方面,我们为任意线性组合的回归系数组建立了置信集。仿真结果表明,所提方法能生成有效且紧凑的模型置信集,并在回归系数覆盖方面优于现有最优的去偏方法。此外,我们分析了免疫应答相关的单细胞RNA-seq数据。除识别出文献中已被证实的相关基因外,该方法还发现了一个此前未被研究的重要基因,为理解细胞免疫应答机制揭示了潜在的新方向。