A Systematic Literature Review of Explainable AI for Software Engineering

Context: In recent years, leveraging machine learning (ML) techniques has become one of the main solutions to tackle many software engineering (SE) tasks, in research studies (ML4SE). This has been achieved by utilizing state-of-the-art models that tend to be more complex and black-box, which is led to less explainable solutions that reduce trust and uptake of ML4SE solutions by professionals in the industry. Objective: One potential remedy is to offer explainable AI (XAI) methods to provide the missing explainability. In this paper, we aim to explore to what extent XAI has been studied in the SE community (XAI4SE) and provide a comprehensive view of the current state-of-the-art as well as challenge and roadmap for future work. Method: We conduct a systematic literature review on 24 (out of 869 primary studies that were selected by keyword search) most relevant published studies in XAI4SE. We have three research questions that were answered by meta-analysis of the collected data per paper. Results: Our study reveals that among the identified studies, software maintenance (\%68) and particularly defect prediction has the highest share on the SE stages and tasks being studied. Additionally, we found that XAI methods were mainly applied to classic ML models rather than more complex models. We also noticed a clear lack of standard evaluation metrics for XAI methods in the literature which has caused confusion among researchers and a lack of benchmarks for comparisons. Conclusions: XAI has been identified as a helpful tool by most studies, which we cover in the systematic review. However, XAI4SE is a relatively new domain with a lot of untouched potentials, including the SE tasks to help with, the ML4SE methods to explain, and the types of explanations to offer. This study encourages the researchers to work on the identified challenges and roadmap reported in the paper.

翻译：背景：近年来，利用机器学习技术已成为解决软件工程任务的主要方案之一（机器学习驱动软件工程ML4SE）。这归功于采用日趋复杂且不透明的最先进模型，这种趋势导致解决方案的可解释性降低，进而削弱工业界专业人士对ML4SE解决方案的信任与采用。目标：一种潜在补救方案是引入可解释人工智能方法以弥补缺失的可解释性。本文旨在探究软件工程领域对可解释人工智能的研究程度，并系统阐述当前技术发展水平、面临挑战及未来研究路线图。方法：本文对24篇XAI4SE领域最相关已发表研究（从关键词检索获得的869篇候选文献中筛选）进行系统文献综述。我们提出三个研究问题，通过对每篇论文数据的元分析予以解答。结果：研究发现，在筛选出的研究中，软件维护（68%）尤其是缺陷预测在软件工程阶段与任务中占比最高。此外，XAI方法主要应用于经典机器学习模型而非复杂模型。我们还注意到文献中明显缺乏XAI方法的标准化评估指标，导致研究者认知混乱且缺乏对比基准。结论：大多数研究均将可解释人工智能视为有效工具，并在本综述中予以体现。但XAI4SE作为新兴领域仍存在大量待开发潜力，包括待辅助的软件工程任务、待解释的ML4SE方法及待提供的解释类型。本研究鼓励研究者针对报告中所列挑战与路线图开展进一步工作。