Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective

Feature transformation aims to reconstruct an effective representation space by mathematically refining the existing features. It serves as a pivotal approach to combat the curse of dimensionality, enhance model generalization, mitigate data sparsity, and extend the applicability of classical models. Existing research predominantly focuses on domain knowledge-based feature engineering or learning latent representations. However, these methods, while insightful, lack full automation and fail to yield a traceable and optimal representation space. An indispensable question arises: Can we concurrently address these limitations when reconstructing a feature space for a machine-learning task? Our initial work took a pioneering step towards this challenge by introducing a novel self-optimizing framework. This framework leverages the power of three cascading reinforced agents to automatically select candidate features and operations for generating improved feature transformation combinations. Despite the impressive strides made, there was room for enhancing its effectiveness and generalization capability. In this extended journal version, we advance our initial work from two distinct yet interconnected perspectives: 1) We propose a refinement of the original framework, which integrates a graph-based state representation method to capture the feature interactions more effectively and develop different Q-learning strategies to alleviate Q-value overestimation further. 2) We utilize a new optimization technique (actor-critic) to train the entire self-optimizing framework in order to accelerate the model convergence and improve the feature transformation performance. Finally, to validate the improved effectiveness and generalization capability of our framework, we perform extensive experiments and conduct comprehensive analyses.

翻译：特征变换旨在通过对现有特征进行数学精炼来重构有效的表示空间。它是应对维度灾难、增强模型泛化能力、缓解数据稀疏性以及扩展经典模型适用性的关键方法。现有研究主要关注基于领域知识的特征工程或学习潜在表征。然而，这些方法虽富有洞见，但缺乏完全自动化，且无法产生可追踪且最优的表示空间。一个不可回避的问题由此产生：在为机器学习任务重构特征空间时，我们能否同时解决这些局限性？我们的初步工作通过引入一种新颖的自优化框架，向这一挑战迈出了开创性的一步。该框架利用三个级联强化代理的强大能力，自动选择候选特征和操作，以生成改进的特征变换组合。尽管取得了显著进展，但其有效性和泛化能力仍有提升空间。在这篇扩展期刊版本中，我们从两个不同但相互关联的角度推进了初步工作：1) 我们提出对原始框架进行改进，集成了一种基于图的状态表示方法以更有效地捕捉特征交互，并开发了不同的Q学习策略以进一步减轻Q值过高估计问题。2) 我们利用一种新的优化技术（演员-评论家）来训练整个自优化框架，以加速模型收敛并提升特征变换性能。最后，为了验证框架改进后的有效性和泛化能力，我们进行了大量实验并开展了全面分析。