Automated Feature Engineering (AutoFE) has become an important task for any machine learning project, as it can help improve model performance and gain more information for statistical analysis. However, most current approaches for AutoFE rely on manual feature creation or use methods that can generate a large number of features, which can be computationally intensive and lead to overfitting. To address these challenges, we propose a novel convolutional method called FeatGeNN that extracts and creates new features using correlation as a pooling function. Unlike traditional pooling functions like max-pooling, correlation-based pooling considers the linear relationship between the features in the data matrix, making it more suitable for tabular data. We evaluate our method on various benchmark datasets and demonstrate that FeatGeNN outperforms existing AutoFE approaches regarding model performance. Our results suggest that correlation-based pooling can be a promising alternative to max-pooling for AutoFE in tabular data applications.
翻译:自动特征工程(AutoFE)已成为任何机器学习项目中的重要任务,因为它有助于提升模型性能并为统计分析获取更多信息。然而,当前大多数AutoFE方法依赖于手动特征创建或采用可能生成大量特征的技术,这会导致计算开销大且易引发过拟合。为应对这些挑战,我们提出一种名为FeatGeNN的新型卷积方法,该方法利用相关性作为池化函数来提取和创建新特征。与最大池化等传统池化函数不同,基于相关性的池化考虑了数据矩阵中特征间的线性关系,从而更适合表格数据。我们在多个基准数据集上评估了该方法,结果表明FeatGeNN在模型性能方面优于现有的AutoFE方法。我们的研究提示,对于表格数据应用中的AutoFE,基于相关性的池化有望成为最大池化的替代方案。