Large language models (LLMs) are increasingly used to automate feature engineering in tabular learning. Given task-specific information, LLMs can propose diverse feature transformation operations to enhance downstream model performance. However, current approaches typically assign the LLM as a black-box optimizer, responsible for both proposing and selecting operations based solely on its internal heuristics, which often lack calibrated estimations of operation utility and consequently lead to repeated exploration of low-yield operations without a principled strategy for prioritizing promising directions. In this paper, we propose a human-LLM collaborative feature engineering framework for tabular learning. We begin by decoupling the transformation operation proposal and selection processes, where LLMs are used solely to generate operation candidates, while the selection is guided by explicitly modeling the utility and uncertainty of each proposed operation. Since accurate utility estimation can be difficult especially in the early rounds of feature engineering, we design a mechanism within the framework that selectively elicits and incorporates human expert preference feedback, comparing which operations are more promising, into the selection process to help identify more effective operations. Our evaluations on both the synthetic study and the real user study demonstrate that the proposed framework improves feature engineering performance across a variety of tabular datasets and reduces users' cognitive load during the feature engineering process.
翻译:大语言模型(LLMs)在表格学习中被越来越多地用于自动化特征工程。给定任务特定信息后,LLMs能够提出多样化的特征变换操作以提升下游模型性能。然而,现有方法通常将LLM视为黑盒优化器,由其同时负责基于内部启发式规则提出和选择操作。这种方式往往缺乏对操作效用的校准估计,从而导致在缺乏优先探索有前景方向的原则性策略的情况下,反复探索低效操作。本文提出一种面向表格学习的人机协作特征工程框架。我们首先解耦变换操作的提出与选择过程:LLM仅用于生成候选操作,而选择过程则通过显式建模每个提议操作的效用与不确定性来指导。由于准确的效用估计在特征工程早期阶段尤为困难,我们在框架内设计了一种机制,选择性地收集并整合人类专家偏好反馈——即比较哪些操作更具潜力——到选择过程中,以帮助识别更有效的操作。在仿真实验和真实用户研究中的评估表明,所提框架在多种表格数据集上提升了特征工程性能,并降低了用户在特征工程过程中的认知负荷。