Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensively review and analyze existing literature concerning learning-based methods within the CI domain. We endeavour to identify and analyse various techniques documented in the literature, emphasizing the fundamental attributes of training phases within learning-based solutions in the context of CI. Method: We conducted a Systematic Literature Review (SLR) involving 52 primary studies. Through statistical and thematic analyses, we explored the correlations between CI tasks and the training phases of learning-based methodologies across the selected studies, encompassing a spectrum from data engineering techniques to evaluation metrics. Results: This paper presents an analysis of the automation of CI tasks utilizing learning-based methods. We identify and analyze nine types of data sources, four steps in data preparation, four feature types, nine subsets of data features, five approaches for hyperparameter selection and tuning, and fifteen evaluation metrics. Furthermore, we discuss the latest techniques employed, existing gaps in CI task automation, and the characteristics of the utilized learning-based techniques. Conclusion: This study provides a comprehensive overview of learning-based methods in CI, offering valuable insights for researchers and practitioners developing CI task automation. It also highlights the need for further research to advance these methods in CI.
翻译:背景:机器学习(ML)与深度学习(DL)通过分析原始数据在特定阶段提取有价值的洞见。软件项目中持续实践的兴起,强调了利用这些基于学习的方法实现持续集成(CI)的自动化,而此类方法日益广泛的应用突显了系统化知识的必要性。目标:本研究旨在全面回顾与分析CI领域中关于基于学习方法的现有文献。我们致力于识别并分析文献中记载的各种技术,重点关注CI背景下基于学习的解决方案中训练阶段的基本属性。方法:我们开展了一项包含52项主要研究的系统文献综述(SLR)。通过统计与主题分析,我们探讨了所选研究中CI任务与基于学习方法训练阶段之间的关联,涵盖从数据工程技术到评估指标的一系列内容。结果:本文呈现了利用基于学习方法实现CI任务自动化的分析。我们识别并分析了九种数据源、数据准备的四个步骤、四种特征类型、九个数据特征子集、五种超参数选择与调优方法,以及十五种评估指标。此外,我们讨论了所采用的最新技术、CI任务自动化中存在的差距,以及所用基于学习技术的特点。结论:本研究全面概述了CI中基于学习的方法,为开发CI任务自动化的研究人员与实践者提供了有价值的见解,同时强调了需要进一步研究以推动这些方法在CI中的发展。