Enhancing Tabular Data Optimization with a Flexible Graph-based Reinforced Exploration Strategy

Tabular data optimization methods aim to automatically find an optimal feature transformation process that generates high-value features and improves the performance of downstream machine learning tasks. Current frameworks for automated feature transformation rely on iterative sequence generation tasks, optimizing decision strategies through performance feedback from downstream tasks. However, these approaches fail to effectively utilize historical decision-making experiences and overlook potential relationships among generated features, thus limiting the depth of knowledge extraction. Moreover, the granularity of the decision-making process lacks dynamic backtracking capabilities for individual features, leading to insufficient adaptability when encountering inefficient pathways, adversely affecting overall robustness and exploration efficiency. To address the limitations observed in current automatic feature engineering frameworks, we introduce a novel method that utilizes a feature-state transformation graph to effectively preserve the entire feature transformation journey, where each node represents a specific transformation state. During exploration, three cascading agents iteratively select nodes and idea mathematical operations to generate new transformation states. This strategy leverages the inherent properties of the graph structure, allowing for the preservation and reuse of valuable transformations. It also enables backtracking capabilities through graph pruning techniques, which can rectify inefficient transformation paths. To validate the efficacy and flexibility of our approach, we conducted comprehensive experiments and detailed case studies, demonstrating superior performance in diverse scenarios.

翻译：表格数据优化方法旨在自动寻找最优的特征变换流程，以生成高价值特征并提升下游机器学习任务的性能。当前的自动化特征变换框架依赖于迭代序列生成任务，通过下游任务的性能反馈来优化决策策略。然而，这些方法未能有效利用历史决策经验，且忽视了生成特征间的潜在关联，从而限制了知识提取的深度。此外，决策过程的粒度缺乏对单个特征的动态回溯能力，导致在遇到低效路径时适应性不足，对整体鲁棒性和探索效率产生不利影响。为应对当前自动特征工程框架的局限性，我们提出了一种新方法，该方法利用特征状态变换图有效保存整个特征变换历程，其中每个节点代表一个特定的变换状态。在探索过程中，三个级联代理迭代地选择节点及数学运算思想以生成新的变换状态。该策略充分利用图结构的固有特性，可保存并复用有价值的变换，并通过图剪枝技术实现回溯能力，从而纠正低效的变换路径。为验证我们方法的有效性与灵活性，我们开展了全面的实验与详细的案例研究，展示了其在多种场景下的优越性能。