Motivation: We explored how explainable AI (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g., transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins .
翻译:动机:我们探讨了可解释人工智能(XAI)如何有助于揭示神经网络在蛋白质功能预测中的内部工作机制,通过扩展广泛使用的积分梯度XAI方法,使其能够检查针对基因本体术语和酶委员会编号预测进行微调的Transformer模型内部的潜在表征。结果:该方法使我们能够识别序列中Transformer特别关注的氨基酸,并表明这些相关的序列部分反映了生物学和化学的预期,无论是在嵌入层还是在模型内部——在模型内部,我们发现了多个Transformer注意力头,其归因图与跨许多蛋白质的基准真实序列注释(如跨膜区域、活性位点)存在统计学显著对应。可用性与实现:源代码可通过 https://github.com/markuswenzel/xai-proteins 访问。