Sepsis remains one of the most complex and heterogeneous syndromes in intensive care, characterized by diverse physiological trajectories and variable responses to treatment. While deep learning models perform well in the early prediction of sepsis, they often lack interpretability and ignore latent patient sub-phenotypes. In this work, we propose a machine learning framework by opening up a new avenue for addressing this issue: a relational approach. Temporal data from electronic medical records (EMRs) are viewed as multivariate patient logs and represented in a relational data schema. Then, a propositionalisation technique (based on classic aggregation/selection functions from the field of relational data) is applied to construct interpretable features to "flatten" the data. Finally, the flattened data is classified using a selective naive Bayesian classifier. Experimental validation demonstrates the relevance of the suggested approach as well as its extreme interpretability. The interpretation is fourfold: univariate, global, local, and counterfactual.
翻译:败血症仍然是重症监护中最复杂且最具异质性的综合征之一,其特点在于多样的生理轨迹和对治疗的不同响应。尽管深度学习模型在败血症早期预测中表现良好,但它们往往缺乏可解释性,并忽略了潜在的亚表型。在本文中,我们提出了一种机器学习框架,通过开辟新的解决路径:一种关系性方法。来自电子病历(EMRs)的时序数据被视为多变量患者日志,并以关系数据模式表示。随后,应用一种命题化技术(基于关系数据领域的经典聚合/选择函数)来构建可解释的特征以“扁平化”数据。最后,使用选择性朴素贝叶斯分类器对扁平化后的数据进行分类。实验验证表明了所提出方法的相关性及其极端的可解释性。这种解释分为四重:单变量、全局、局部和反事实。