Choosing relevant predictors is central to the analysis of biomedical time-to-event data. Classical frequentist inference, however, presumes that the set of covariates is fixed in advance and does not account for data-driven variable selection. As a consequence, naive post-selection inference may be biased and misleading. In right-censored survival settings, these issues may be further exacerbated by the additional uncertainty induced by censoring. We investigate several inference procedures applied after variable selection for the coefficients of the Lasso and its extension, the adaptive Lasso, in the context of the Cox model. The methods considered include sample splitting, exact post-selection inference, and the debiased Lasso. Their performance is examined in a neutral simulation study reflecting realistic covariate structures and censoring rates commonly encountered in biomedical applications. To complement the simulation results, we illustrate the practical behavior of these procedures in an applied example using a publicly available survival dataset.
翻译:在生物医学时间-事件数据分析中,选择相关预测变量至关重要。然而,经典频率学派推断假定协变量集预先固定,并未考虑数据驱动的变量选择过程。因此,朴素的选择后推断可能存在偏差并产生误导性结论。在右删失生存分析场景中,由删失机制引入的额外不确定性可能进一步加剧这些问题。本研究在Cox模型框架下,针对Lasso及其扩展方法——自适应Lasso的系数估计,系统考察了变量选择后应用的多种推断方法。所探讨的方法包括样本分割法、精确选择后推断法以及去偏Lasso法。通过设计反映生物医学应用中常见协变量结构和删失率的中立模拟研究,对这些方法的性能进行了全面评估。为补充模拟结果,我们使用公开可得的生存数据集,通过应用实例展示了这些方法在实际场景中的具体表现。