A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research

The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 63 papers across 21 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a roadmap highlighting potential opportunities we deemed appropriate and important for future work.

翻译：人工智能（AI）算法，特别是机器学习（ML）与深度学习（DL）领域的显著成就，推动了其在包括软件工程（SE）在内的多个领域的广泛部署。然而，由于其黑箱特性，这些前景广阔的AI驱动软件工程模型仍远未实现实际部署。这种可解释性的缺失为它们在关键任务（例如漏洞检测，其中决策透明度至关重要）中的应用带来了不期望的风险。本文旨在通过系统文献综述的方式，阐明这一跨学科领域，重点关注旨在提升软件工程背景下AI模型可解释性的方法。本综述涵盖了发表于主要软件工程与人工智能会议及期刊的文献，共涉及63篇论文和21个独特的软件工程任务。基于三个关键研究问题，我们旨在：（1）总结目前可解释人工智能（XAI）技术已取得成功的软件工程任务；（2）对不同的XAI技术进行分类与分析；（3）调研现有的评估方法。基于我们的发现，我们识别出现有研究中尚待解决的一系列挑战，并提出了一个路线图，重点指出了我们认为适合且重要的未来工作潜在机遇。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日