OpenFE: Automated Feature Generation with Expert-level Performance

The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting. The code is available at https://github.com/ZhangTP1996/OpenFE.

翻译：自动化特征生成的目标是将机器学习专家从繁重的手动特征生成任务中解放出来，这对于提升表格数据的学习性能至关重要。自动化特征生成的主要挑战在于如何从海量候选特征中高效且准确地识别有效特征。本文提出了OpenFE，一种能提供与机器学习专家相竞争结果的自动化特征生成工具。OpenFE通过两个组件实现高效率和准确性：1）一种新型特征提升方法，用于准确评估候选特征的增量性能；2）一种两阶段剪枝算法，以由粗到细的方式进行特征剪枝。在十个基准数据集上的大量实验表明，OpenFE的性能大幅优于现有基线方法。我们进一步在两项有数千个数据科学团队参与的Kaggle竞赛中评估了OpenFE。在这两项竞赛中，使用简单基线模型搭配OpenFE生成的特征，分别可以击败99.3%和99.6%的数据科学团队。除实证结果外，我们还在一个简单且具有代表性的场景中从理论上论证了特征生成的有效性。代码可在https://github.com/ZhangTP1996/OpenFE获取。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日