Assessing the Use of AutoML for Data-Driven Software Engineering

Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.

翻译：摘要：背景。随着人工智能和机器学习在构建软件应用中的广泛采用，企业难以招募到深谙此类技术的人才。在此背景下，自动化机器学习作为填补人工智能/机器学习技能缺口的有前景方案而迅速崛起，因其承诺实现端到端人工智能/机器学习管道的自动化构建——这些管道通常需要由专业团队成员设计。目的。尽管兴趣日益增长且期望值较高，但目前关于团队在开发基于人工智能/机器学习的系统时如何采用自动化机器学习，以及从业者和研究人员对其认知程度的信息仍然匮乏。方法。为填补这些空白，本文提出一项混合方法研究，包括对12个端到端自动化机器学习工具在两个软件工程数据集上的基准测试，以及通过用户调查与后续访谈来深化对自动化机器学习的采用和认知的理解。结果。我们发现，自动化机器学习解决方案能够生成在软件工程领域分类任务中表现优于由研究人员训练和优化的模型。此外，研究结果表明，当前可用的自动化机器学习方案名不副实，因为它们未能同等支持机器学习开发工作流程各阶段及所有团队成员的自动化需求。结论。我们提炼出相关见解，以指导软件工程研究社区如何利用自动化机器学习提升其活动效率，并指导工具开发者如何设计下一代自动化机器学习技术。