The pattern of state changes in a biomedical time series can be related to health or disease. This work presents a principled approach for selecting a changepoint detection algorithm for a specific task, such as disease classification. Eight key algorithms were compared, and the performance of each algorithm was evaluated as a function of temporal tolerance, noise, and abnormal conduction (ectopy) on realistic artificial cardiovascular time series data. All algorithms were applied to real data (cardiac time series of 22 patients with REM-behavior disorder (RBD) and 15 healthy controls) using the parameters selected on artificial data. Finally, features were derived from the detected changepoints to classify RBD patients from healthy controls using a K-Nearest Neighbors approach. On artificial data, Modified Bayesian Changepoint Detection algorithm provided superior positive predictive value for state change identification while Recursive Mean Difference Maximization (RMDM) achieved the highest true positive rate. For the classification task, features derived from the RMDM algorithm provided the highest leave one out cross validated accuracy of 0.89 and true positive rate of 0.87. Automatically detected changepoints provide useful information about subject's physiological state which cannot be directly observed. However, the choice of change point detection algorithm depends on the nature of the underlying data and the downstream application, such as a classification task. This work represents the first time change point detection algorithms have been compared in a meaningful way and utilized in a classification task, which demonstrates the effect of changepoint algorithm choice on application performance.
翻译:生物医学时间序列中的状态变化模式可能与健康或疾病相关。本文提出了一种针对特定任务(如疾病分类)选择变点检测算法的原则性方法。我们比较了八种关键算法,并在逼真的人工心血管时间序列数据上,评估了每种算法在时间容差、噪声和异常传导(异位搏动)条件下的性能。所有算法均使用基于人工数据选定的参数,应用于真实数据(22名快速眼动睡眠行为障碍(RBD)患者和15名健康对照者的心脏时间序列)。最后,从检测到的变点中提取特征,并采用K近邻方法对RBD患者与健康对照者进行分类。在人工数据上,改进的贝叶斯变点检测算法在状态变化识别方面提供了更高的阳性预测值,而递归均值差最大化(RMDM)算法实现了最高的真阳性率。对于分类任务,基于RMDM算法提取的特征取得了最高的留一法交叉验证准确率(0.89)和真阳性率(0.87)。自动检测的变点提供了关于受试者无法直接观察的生理状态的有用信息。然而,变点检测算法的选择取决于底层数据的性质以及下游应用(如分类任务)。本研究首次以有意义的方式对变点检测算法进行比较,并将其用于分类任务,从而展示了变点检测算法选择对应用性能的影响。