SIBILA: A novel interpretable ensemble of general-purpose machine learning models applied to medical contexts

Personalized medicine remains a major challenge for scientists. The rapid growth of Machine learning and Deep learning has made them a feasible al- ternative for predicting the most appropriate therapy for individual patients. However, the need to develop a custom model for every dataset, the lack of interpretation of their results and high computational requirements make many reluctant to use these methods. Aiming to save time and bring light to the way models work internally, SIBILA has been developed. SIBILA is an ensemble of machine learning and deep learning models that applies a range of interpretability algorithms to identify the most relevant input features. Since the interpretability algo- rithms may not be in line with each other, a consensus stage has been imple- mented to estimate the global attribution of each variable to the predictions. SIBILA is containerized to be run on any high-performance computing plat- form. Although conceived as a command-line tool, it is also available to all users free of charge as a web server at https://bio-hpc.ucam.edu/sibila. Thus, even users with few technological skills can take advantage of it. SIBILA has been applied to two medical case studies to show its ability to predict in classification problems. Even though it is a general-purpose tool, it has been developed with the aim of becoming a powerful decision-making tool for clinicians, but can actually be used in many other domains. Thus, other two non-medical examples are supplied as supplementary material to prove that SIBILA still works well with noise and in regression problems.

翻译：个性化医疗仍是科学家面临的重大挑战。机器学习和深度学习的快速发展使其成为预测个体患者最佳疗法的可行替代方案。然而，为每个数据集定制模型的需求、其结果缺乏可解释性以及高昂的计算开销，令许多人对此类方法持保留态度。为节省开发时间并揭示模型的内部工作机制，我们研发了SIBILA。SIBILA是一种集成机器学习和深度学习模型的集合方法，通过应用一系列可解释性算法来识别最相关的输入特征。鉴于不同可解释性算法可能产生不一致的结果，我们引入了一个共识阶段，以评估各变量对预测结果的全局贡献。SIBILA已实现容器化，可在任何高性能计算平台上运行。尽管被设计为命令行工具，但它也以网页服务器的形式免费向所有用户开放（访问地址：https://bio-hpc.ucam.edu/sibila），这使得即使技术能力有限的用户也能充分利用其功能。SIBILA已被应用于两个医学案例研究，以展示其在分类问题中的预测能力。虽然这是一个通用型工具，但其开发目标是为临床医生提供强大的决策支持工具，不过实际上它可应用于许多其他领域。因此，我们另提供两个非医学案例作为补充材料，以证明SIBILA在含噪声数据和回归问题中仍能表现良好。