Explainable AI for Bioinformatics: Methods, Tools, and Applications

Artificial intelligence (AI) systems utilizing deep neural networks (DNNs) and machine learning (ML) algorithms are widely used for solving important problems in bioinformatics, biomedical informatics, and precision medicine. However, complex DNNs or ML models, which are often perceived as opaque and black-box, can make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. Additionally, in sensitive areas like healthcare, explainability and accountability are not only desirable but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable artificial intelligence (XAI) aims to overcome the opaqueness of black-box models and provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and the factors that influence their outcomes. However, most state-of-the-art interpretable ML methods are domain-agnostic and evolved from fields like computer vision, automated reasoning, or statistics, making direct application to bioinformatics problems challenging without customization and domain-specific adaptation. In this paper, we discuss the importance of explainability in the context of bioinformatics, provide an overview of model-specific and model-agnostic interpretable ML methods and tools, and outline their potential caveats and drawbacks. Besides, we discuss how to customize existing interpretable ML methods for bioinformatics problems. Nevertheless, we demonstrate how XAI methods can improve transparency through case studies in bioimaging, cancer genomics, and text mining.

翻译：基于深度神经网络（DNNs）和机器学习（ML）算法的人工智能（AI）系统被广泛应用于解决生物信息学、生物医学信息学及精准医学中的关键问题。然而，复杂的DNNs或ML模型通常被视为不透明的"黑箱"，使得理解其决策背后的推理过程变得困难。这种透明度缺失对最终用户、决策制定者乃至AI开发者均构成挑战。更关键的是，在医疗健康等敏感领域，对于可能对人类生命产生重大影响的AI系统而言，可解释性与问责性不仅是理想化要求，更是法律明文规定。公平性是另一日益受到关注的议题，因为算法决策不应基于敏感属性对特定群体或个人表现出偏见或歧视。可解释人工智能（XAI）旨在突破黑箱模型的不透明性，揭示AI系统的决策机制。可解释ML模型能够阐明其预测过程及影响结果的关键因素。然而，当前最先进的可解释ML方法多为领域无关技术，源自计算机视觉、自动推理或统计学等领域，若不经定制化改造与领域适配，直接应用于生物信息学问题仍存在困难。本文探讨了生物信息学背景下可解释性的重要性，系统梳理了模型专属型与模型无关型可解释ML方法及工具，并指出其潜在局限与缺陷。此外，我们论述了如何为生物信息学问题定制现有可解释ML方法。最后，通过生物成像、癌症基因组学及文本挖掘的案例研究，展示了XAI方法提升模型透明度的实践路径。