The semiparametric linear hazard regression model introduced by McKeague and Sasieni (1994) is an extension of the linear hazard regression model developed by Aalen (1980). Methods of model selection for this type of model are still underdeveloped. In the process of fitting a semiparametric linear hazard regression model one usually starts with a given set of covariates. For each covariate one has at least the following three choices: allow it to have time-varying effect; allow it to have constant effect over time; or exclude it from the model. In this paper we discuss focused information criteria (FIC) to help with this choice. In the spirit of Claeskens and Hjort (2003, 2008), `focused' means that one is interested in one specific quantity, e.g. the probability of survival of a patient with a certain set of covariates up to a given time. The FIC involves estimating the mean squared error of the estimator of the quantity one is interested in, and the chosen model is the one minimising this estimated mean squared error. The focused model selection machinery is extended to allow for weighted versions, leading to a suitable wFIC method that aims at finding models that lead to good estimates of a given list of parameters, such as survival probabilities for a subset of patients or for a specified region of covariate vectors. In addition to developing model selection criteria, methods associated with averaging across the best models are also discussed. We illustrate these methods of model selection in a real data situation.
翻译:暂无翻译