Before applying data analytics or machine learning to a data set, a vital step is usually the construction of an informative set of features from the data. In this paper, we present SMARTFEAT, an efficient automated feature engineering tool to assist data users, even non-experts, in constructing useful features. Leveraging the power of Foundation Models (FMs), our approach enables the creation of new features from the data, based on contextual information and open-world knowledge. Our method incorporates an intelligent operator selector that discerns a subset of operators, effectively avoiding exhaustive combinations of original features, as is typically observed in traditional automated feature engineering tools. Moreover, we address the limitations of performing data tasks through row-level interactions with FMs, which could lead to significant delays and costs due to excessive API calls. We introduce a function generator that facilitates the acquisition of efficient data transformations, such as dataframe built-in methods or lambda functions, ensuring the applicability of SMARTFEAT to generate new features for large datasets. Code repo with prompt details and datasets: (https://github.com/niceIrene/SMARTFEAT).
翻译:在将数据分析或机器学习应用于数据集之前,一个关键步骤通常是从数据中构建一组信息丰富的特征。本文提出SMARTFEAT,一种高效的自动化特征工程工具,旨在帮助数据用户(包括非专家)构建有用的特征。借助基础模型(FMs)的强大能力,我们的方法能够基于上下文信息和开放世界知识从数据中创建新特征。该方法引入了一个智能算子选择器,能够识别出算子子集,有效避免传统自动化特征工程工具中常见的原始特征穷举组合问题。此外,我们解决了通过与FMs进行行级交互执行数据任务时存在的局限性——这种交互方式可能因过多的API调用而导致显著的延迟和成本。我们设计了一个函数生成器,用于获取高效的数据转换方法(如数据框内置方法或lambda函数),从而确保SMARTFEAT能够适用于为大规模数据集生成新特征。代码仓库及提示细节与数据集:(https://github.com/niceIrene/SMARTFEAT)。