Time-to-event analysis (survival analysis) is used when the response of interest is the time until a pre-specified event occurs. Time-to-event data are sometimes discrete either because time itself is discrete or due to grouping of failure times into intervals or rounding off measurements. In addition, the failure of an individual could be one of several distinct failure types, known as competing risks (events). Most methods and software packages for survival regression analysis assume that time is measured on a continuous scale. It is well-known that naively applying standard continuous-time models with discrete-time data may result in biased estimators of the discrete-time models. The Python package PyDTS, for simulating, estimating and evaluating semi-parametric competing-risks models for discrete-time survival data, is introduced. The package implements a fast procedure that enables including regularized regression methods, such as LASSO and elastic net, among others. A simulation study showcases flexibility and accuracy of the package. The utility of the package is demonstrated by analysing the Medical Information Mart for Intensive Care (MIMIC) - IV dataset for prediction of hospitalization length of stay.
翻译:时间至事件分析(生存分析)适用于当感兴趣的反应变量是发生预指定事件前的时间长度。时间至事件数据有时具有离散性,这可能是由于时间本身离散,或因将失效时间分组为区间或对测量值进行四舍五入所致。此外,个体可能经历多种不同的失效类型(称为竞争风险或事件)。大多数生存回归分析方法和软件包均假定时间以连续尺度测量。众所周知,将标准连续时间模型直接应用于离散时间数据可能导致离散时间模型的估计量出现偏差。本文介绍Python包PyDTS,用于模拟、估计和评估离散时间生存数据的半参数竞争风险模型,并实现包括LASSO和弹性网络等正则化回归方法的快速流程。模拟研究展示了该包的灵活性和准确性。通过分析重症监护医学信息集市(MIMIC)-IV数据集预测住院时长,验证了该包的实用性。