In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. We propose a nonparametric efficient estimation and inference procedure as well as a null hypothesis testing procedure that are valid even when complex machine learning tools are used for prediction. Through simulations, we demonstrate that our proposed procedures have good operating characteristics, and we illustrate their use by investigating the longitudinal importance of risk factors for suicide attempt.
翻译:在随时间收集数据的预测场景中,通常需要了解变量在预测每个时间点响应时的重要性,以及变量重要性在整个时间序列上的汇总情况。基于变量重要性度量估计与推断的最新进展,我们定义了变量重要性轨迹的汇总指标。这些指标可被估计,且无论用于估计预测函数的算法选择如何,相同的方法都可用于推断。我们提出了一种非参数有效的估计与推断方法,以及一种零假设检验方法,即使使用复杂机器学习工具进行预测时这些方法依然有效。通过模拟实验,我们证明了所提方法具有良好的操作特性,并通过研究自杀企图风险因素的纵向重要性对其应用进行了说明。