The proportional hazards (PH) and accelerated failure time (AFT) models are the most widely used hazard structures for analysing time-to-event data. When the goal is to identify variables associated with event times, variable selection is typically performed within a single hazard structure, imposing strong assumptions on how covariates affect the hazard function. To allow simultaneous selection of relevant variables and the hazard structure itself, we develop a Bayesian variable selection approach within the general hazard (GH) model, which includes the PH, AFT, and other structures as special cases. We propose two types of g-priors for the regression coefficients that enable tractable computation and show that both lead to consistent model selection. We also introduce a hierarchical prior on the model space that accounts for multiplicity and penalises model complexity. To efficiently explore the GH model space, we extend the Add-Delete-Swap algorithm to jointly sample variable inclusion indicators and hazard structures. Simulation studies show accurate recovery of both the true hazard structure and active variables across different sample sizes and censoring levels. Two real-data applications are presented to illustrate the use of the proposed methodology and to compare it with existing variable selection methods.
翻译:比例风险(PH)模型与加速失效时间(AFT)模型是分析时间-事件数据时最广泛使用的风险结构。当研究目标为识别与事件时间相关的变量时,变量选择通常仅在单一风险结构内进行,这强加了协变量如何影响风险函数的严格假设。为了实现对相关变量与风险结构本身的同步选择,我们在广义风险(GH)模型中开发了一种贝叶斯变量选择方法,该模型将PH、AFT及其他结构作为特例包含在内。我们为回归系数提出了两种类型的g-先验,以实现易于处理的计算,并证明二者均可导向一致的模型选择。我们还引入了一个模型空间上的分层先验,该先验考虑了多重性并对模型复杂度施加惩罚。为了高效探索GH模型空间,我们将“添加-删除-交换”算法扩展至联合采样变量包含指标与风险结构。模拟研究表明,在不同样本量与删失水平下,该方法均能准确恢复真实的风险结构与活跃变量。本文通过两个实际数据应用展示了所提方法的使用,并将其与现有变量选择方法进行了比较。