Focused Information Criteria for Semiparametric Linear Hazard Regression

from arxiv, 16 pages, 4 figures, 3 tables; Statistical Research Report, Department of Mathematics, University of Oslo, February 2009, now arXiv'd March 2026. The paper was accepted by Biometrika in 2010, modulo "minor changes", but things slipped away from our tables

The semiparametric linear hazard regression model introduced by McKeague and Sasieni (1994) is an extension of the linear hazard regression model developed by Aalen (1980). Methods of model selection for this type of model are still underdeveloped. In the process of fitting a semiparametric linear hazard regression model one usually starts with a given set of covariates. For each covariate one has at least the following three choices: allow it to have time-varying effect; allow it to have constant effect over time; or exclude it from the model. In this paper we discuss focused information criteria (FIC) to help with this choice. In the spirit of Claeskens and Hjort (2003, 2008), `focused' means that one is interested in one specific quantity, e.g. the probability of survival of a patient with a certain set of covariates up to a given time. The FIC involves estimating the mean squared error of the estimator of the quantity one is interested in, and the chosen model is the one minimising this estimated mean squared error. The focused model selection machinery is extended to allow for weighted versions, leading to a suitable wFIC method that aims at finding models that lead to good estimates of a given list of parameters, such as survival probabilities for a subset of patients or for a specified region of covariate vectors. In addition to developing model selection criteria, methods associated with averaging across the best models are also discussed. We illustrate these methods of model selection in a real data situation.

翻译：McKeague与Sasieni（1994）提出的半参数线性风险回归模型，是对Aalen（1980）发展的线性风险回归模型的扩展。针对此类模型的模型选择方法仍不完善。在拟合半参数线性风险回归模型的过程中，通常从一组给定的协变量开始。对于每个协变量，研究者至少面临以下三种选择：允许其具有时变效应；允许其具有随时间不变的恒定效应；或将其从模型中排除。本文探讨了用于辅助该选择的聚焦信息准则（FIC）。依照Claeskens与Hjort（2003，2008）的思想，“聚焦”意味着研究者关注某个特定量，例如具有特定协变量组合的患者在给定时间前的生存概率。FIC涉及估计目标量估计量的均方误差，所选模型即为使该估计均方误差最小化的模型。该聚焦模型选择机制被扩展至加权版本，形成了旨在寻找能对给定参数列表（如特定患者子集或协变量向量指定区域的生存概率）提供良好估计的模型的wFIC方法。除发展模型选择准则外，本文还讨论了基于最优模型进行模型平均的相关方法。我们通过实际数据案例展示了这些模型选择方法的应用。