With the development of data collection techniques, analysis with a survival response and high-dimensional covariates has become routine. Here we consider an interaction model, which includes a set of low-dimensional covariates, a set of high-dimensional covariates, and their interactions. This model has been motivated by gene-environment (G-E) interaction analysis, where the E variables have a low dimension, and the G variables have a high dimension. For such a model, there has been extensive research on estimation and variable selection. Comparatively, inference studies with a valid false discovery rate (FDR) control have been very limited. The existing high-dimensional inference tools cannot be directly applied to interaction models, as interactions and main effects are not ``equal". In this article, for high-dimensional survival analysis with interactions, we model survival using the Accelerated Failure Time (AFT) model and adopt a ``weighted least squares + debiased Lasso'' approach for estimation and selection. A hierarchical FDR control approach is developed for inference and respect of the ``main effects, interactions'' hierarchy. { The asymptotic distribution properties of the debiased Lasso estimators} are rigorously established. Simulation demonstrates the satisfactory performance of the proposed approach, and the analysis of a breast cancer dataset further establishes its practical utility.
翻译:随着数据采集技术的发展,以生存响应和高维协变量为特征的分析已成为常规手段。本文考虑包含一组低维协变量、一组高维协变量及其交互作用的交互模型。该模型受基因-环境(G-E)交互作用分析启发,其中E变量维度较低,G变量维度较高。针对此类模型,已有大量关于估计与变量选择的研究。相比之下,具有有效错误发现率(FDR)控制的推断研究非常有限。现有高维推断工具无法直接应用于交互模型,因为交互作用与主效应并非“对等”。本文针对含交互作用的高维生存分析,采用加速失效时间(AFT)模型进行生存建模,并采用“加权最小二乘+去偏Lasso”方法进行估计与选择。提出了一种分层FDR控制方法,用于推断并遵循“主效应、交互作用”的层级结构。严格建立了去偏Lasso估计量的渐近分布性质。仿真实验表明所提方法性能良好,对乳腺癌数据集的分析进一步验证了其实用价值。