Artificial neural networks show promising performance in detecting correlations within data that are associated with specific outcomes. However, the black-box nature of such models can hinder the knowledge advancement in research fields by obscuring the decision process and preventing scientist to fully conceptualize predicted outcomes. Furthermore, domain experts like healthcare providers need explainable predictions to assess whether a predicted outcome can be trusted in high stakes scenarios and to help them integrating a model into their own routine. Therefore, interpretable models play a crucial role for the incorporation of machine learning into high stakes scenarios like healthcare. In this paper we introduce Convolutional Motif Kernel Networks, a neural network architecture that involves learning a feature representation within a subspace of the reproducing kernel Hilbert space of the position-aware motif kernel function. The resulting model enables to directly interpret and evaluate prediction outcomes by providing a biologically and medically meaningful explanation without the need for additional post-hoc analysis. We show that our model is able to robustly learn on small datasets and reaches state-of-the-art performance on relevant healthcare prediction tasks. Our proposed method can be utilized on DNA and protein sequences. Furthermore, we show that the proposed method learns biologically meaningful concepts directly from data using an end-to-end learning scheme.
翻译:人工神经网络在检测与特定结果相关的数据相关性方面展现出良好性能。然而,这类模型的黑箱特性可能通过模糊决策过程、阻碍科学家充分理解预测结果,从而阻碍研究领域的知识进步。此外,医疗保健提供者等领域专家需要可解释的预测,以评估高风险场景下预测结果的可靠性,并帮助其将模型整合到日常工作中。因此,可解释模型对于将机器学习应用于医疗等高风险场景至关重要。本文提出卷积基序核网络(Convolutional Motif Kernel Networks),这是一种神经网络架构,其核心是在位置感知基序核函数的再生核希尔伯特空间子空间中学习特征表示。该模型能够通过提供生物学和医学意义上合理的解释,直接解读和评估预测结果,无需额外的后验分析。我们证明该模型能够在小数据集上鲁棒学习,并在相关医疗预测任务中达到最新最优性能。本方法可应用于DNA与蛋白质序列分析。此外,我们通过端到端学习框架展示了该方法能从数据中直接学习具有生物学意义的概念。