Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which treat it as a black-box search, operating within rigid, predefined search spaces and lacking domain awareness. While Large Language Models (LLMs) offer a promising alternative by leveraging semantic reasoning to generate unbounded operators, existing methods fail to construct free-form FE pipelines, remaining confined to isolated subtasks such as feature generation. Most importantly, they are rarely optimized jointly with hyperparameter optimization (HPO) of the ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (ToT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that realizes interleaved optimization by adaptively scheduling FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH not only outperforms traditional and LLM-based FE baselines, but also achieves superior end-to-end performance under joint optimization.
翻译:特征工程(FE)在自动化机器学习(AutoML)中至关重要,但传统方法将其视为黑盒搜索,在僵化的预定义搜索空间中运行且缺乏领域感知,这使其成为瓶颈。虽然大语言模型(LLMs)通过利用语义推理生成无界操作符提供了一种有前景的替代方案,但现有方法未能构建自由形式的特征工程流水线,仍局限于特征生成等孤立子任务。最重要的是,它们很少与机器学习模型的超参数优化(HPO)联合优化,导致贪婪的“先特征工程后超参数优化”工作流,无法捕捉强烈的特征工程-超参数优化交互作用。本文提出CoFEH,一个将基于大语言模型的特征工程与贝叶斯超参数优化交错结合的协作框架,以实现稳健的端到端自动化机器学习。CoFEH采用由思维树(ToT)驱动的大语言模型特征工程优化器来探索灵活的特征工程流水线,一个贝叶斯优化(BO)模块来解决超参数优化问题,以及一个动态优化器选择器,通过自适应调度特征工程和超参数优化步骤实现交错优化。关键的是,我们引入了一种相互条件机制,在大语言模型和贝叶斯优化器之间共享上下文,从而实现相互知情的决策。实验表明,CoFEH不仅优于传统和基于大语言模型的特征工程基线,而且在联合优化下实现了更优的端到端性能。