Recently, program synthesis driven by large language models (LLMs) has become increasingly popular. However, program synthesis for machine learning (ML) tasks still poses significant challenges. This paper explores a novel form of program synthesis, targeting ML programs, by combining LLMs and automated machine learning (autoML). Specifically, our goal is to fully automate the generation and optimization of the code of the entire ML workflow, from data preparation to modeling and post-processing, utilizing only textual descriptions of the ML tasks. To manage the length and diversity of ML programs, we propose to break each ML program into smaller, manageable parts. Each part is generated separately by the LLM, with careful consideration of their compatibilities. To ensure compatibilities, we design a testing technique for ML programs. Unlike traditional program synthesis, which typically relies on binary evaluations (i.e., correct or incorrect), evaluating ML programs necessitates more than just binary judgments. Therefore, we further assess ML programs numerically and select the optimal programs from a range of candidates using AutoML methods. In experiments across various ML tasks, our method outperforms existing methods in 10 out of 12 tasks for generating ML programs. In addition, autoML significantly improves the performance of the generated ML programs. In experiments, given the textual task description, our method, Text-to-ML, generates the complete and optimized ML program in a fully autonomous process.
翻译:近年来,基于大语言模型的程序合成技术日益普及,但针对机器学习任务的程序合成仍面临显著挑战。本文探索了一种面向机器学习程序的新型合成范式,通过融合大语言模型与自动化机器学习实现突破。具体而言,我们旨在完全自动化生成和优化从数据预处理到建模及后处理的完整机器学习工作流代码,仅需任务文本描述即可完成。为应对机器学习程序长度与多样性的管理难题,我们提出将每个程序拆分为更小、更易处理的功能模块,每个模块由大语言模型独立生成并严格确保模块兼容性。为此,我们设计了针对机器学习程序的测试技术——不同于传统程序合成的二元评估模式,机器学习程序评估需要超越简单的是非判断。因此,我们进一步对候选程序进行数值化评估,并采用自动化机器学习方法从候选集中筛选最优方案。在涵盖多种机器学习任务的实验中,我们的方法在12项任务中有10项超越现有方法生成机器学习程序的性能。同时,自动化机器学习显著提升了生成程序的实际表现。实验表明,基于文本描述,我们的Text-to-ML系统能够完全自主地生成并优化完整的机器学习程序。