Epilepsy is a chronic neurological disorder that affects a significant portion of the human population and imposes serious risks in the daily life of patients. Despite advances in machine learning and IoT, small, nonstigmatizing wearable devices for continuous monitoring and detection in outpatient environments are not yet available. Part of the reason is the complexity of epilepsy itself, including highly imbalanced data, multimodal nature, and very subject-specific signatures. However, another problem is the heterogeneity of methodological approaches in research, leading to slower progress, difficulty comparing results, and low reproducibility. Therefore, this article identifies a wide range of methodological decisions that must be made and reported when training and evaluating the performance of epilepsy detection systems. We characterize the influence of individual choices using a typical ensemble random-forest model and the publicly available CHB-MIT database, providing a broader picture of each decision and giving good-practice recommendations, based on our experience, where possible.
翻译:癫痫是一种慢性神经系统疾病,影响相当一部分人群,并对患者的日常生活构成严重风险。尽管机器学习和物联网技术取得了进展,但用于门诊环境中连续监测和检测的小型、非污名化可穿戴设备尚未问世。部分原因在于癫痫本身的复杂性,包括高度不平衡的数据、多模态性质以及高度个体化的特征。然而,另一个问题是研究方法论方法的异质性,这导致进展缓慢、结果难以比较以及可重复性低。因此,本文识别了在训练和评估癫痫检测系统性能时必须做出并报告的一系列广泛的方法论决策。我们使用典型的集成随机森林模型和公开的CHB-MIT数据库,表征了单个选择的影响,提供了每项决策的更全面视角,并基于我们的经验尽可能给出良好实践建议。