Machine learning (ML) methods have been developing rapidly, but configuring and selecting proper methods to achieve a desired performance is increasingly difficult and tedious. To address this challenge, automated machine learning (AutoML) has emerged, which aims to generate satisfactory ML configurations for given tasks in a data-driven way. In this paper, we provide a comprehensive survey on this topic. We begin with the formal definition of AutoML and then introduce its principles, including the bi-level learning objective, the learning strategy, and the theoretical interpretation. Then, we summarize the AutoML practices by setting up the taxonomy of existing works based on three main factors: the search space, the search algorithm, and the evaluation strategy. Each category is also explained with the representative methods. Then, we illustrate the principles and practices with exemplary applications from configuring ML pipeline, one-shot neural architecture search, and integration with foundation models. Finally, we highlight the emerging directions of AutoML and conclude the survey.
翻译:机器学习方法发展迅速,但针对特定任务选择和配置合适的方法以获得理想性能日益困难且繁琐。为解决这一挑战,自动化机器学习(AutoML)应运而生,旨在以数据驱动的方式为给定任务生成满意的机器学习配置方案。本文对该领域进行了全面综述。我们首先给出AutoML的形式化定义,进而介绍其原理,包括双层学习目标、学习策略与理论解释。随后,我们基于现有研究在搜索空间、搜索算法和评估策略三大要素上的分类体系,总结AutoML的实践方法,并针对每个类别阐释代表性方法。接着,通过配置机器学习流水线、单次神经架构搜索及与基础模型集成等典型案例,我们阐述了这些原理与实践方法的应用。最后,我们展望了AutoML的前沿发展方向并对本综述进行总结。