The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection criteria of these methods are varied for different domains, making them hard to generalize; 2) the selection performance of these approaches drops significantly when processing high-dimensional feature space coupled with small sample size. In light of these challenges, we pose the question: can selected feature subsets be more robust, accurate, and input dimensionality agnostic? In this paper, we reformulate the feature selection problem as a deep differentiable optimization task and propose a new research perspective: conceptualizing discrete feature subsetting as continuous embedding space optimization. We introduce a novel and principled framework that encompasses a sequential encoder, an accuracy evaluator, a sequential decoder, and a gradient ascent optimizer. This comprehensive framework includes four important steps: preparation of features-accuracy training data, deep feature subset embedding, gradient-optimized search, and feature subset reconstruction. Specifically, we utilize reinforcement feature selection learning to generate diverse and high-quality training data and enhance generalization. By optimizing reconstruction and accuracy losses, we embed feature selection knowledge into a continuous space using an encoder-evaluator-decoder model structure. We employ a gradient ascent search algorithm to find better embeddings in the learned embedding space. Furthermore, we reconstruct feature selection solutions using these embeddings and select the feature subset with the highest performance for downstream tasks as the optimal subset.
翻译:特征选择(包括过滤式、包裹式和嵌入式方法)的目标是为指定下游任务找到最优特征子集。然而,当前特征选择方法受限于:1)这些方法的选择标准因领域而异,难以泛化;2)当处理高维特征空间与小样本规模时,这些方法的选择性能显著下降。针对这些挑战,我们提出疑问:能否使所选特征子集更具鲁棒性、准确性且对输入维度不敏感?在本文中,我们将特征选择问题重新表述为深度可微优化任务,并提出一种新的研究视角:将离散特征子集概念化为连续嵌入空间优化。我们引入一个新颖且原则性的框架,包含序列编码器、精度评估器、序列解码器和梯度上升优化器。该综合框架包括四个重要步骤:特征-精度训练数据准备、深度特征子集嵌入、梯度优化搜索和特征子集重构。具体而言,我们利用强化特征选择学习生成多样且高质量的训练数据并增强泛化能力。通过优化重构损失和精度损失,我们使用编码器-评估器-解码器模型结构将特征选择知识嵌入连续空间。我们采用梯度上升搜索算法在学习到的嵌入空间中寻找更优的嵌入。此外,我们利用这些嵌入重构特征选择解,并选择下游任务中性能最高的特征子集作为最优子集。