Feature Selection (FS), such as filter, wrapper, and embedded methods, aims to find the optimal feature subset for a given downstream task. However, in many real-world practices, 1) the criteria of FS vary across domains; 2) FS is brittle when data is a high-dimensional and small sample size. Can selected feature subsets be more generalized, accurate, and input dimensionality agnostic? We generalize this problem into a deep differentiable feature selection task and propose a new perspective: discrete feature subsetting as continuous embedding space optimization. We develop a generic and principled framework including a deep feature subset encoder, accuracy evaluator, decoder, and gradient ascent optimizer. This framework implements four steps: 1) features-accuracy training data preparation; 2) deep feature subset embedding; 3) gradient-optimized search; 4) feature subset reconstruction. We develop new technical insights: reinforcement as a training data generator, ensembles of diverse peer and exploratory feature selector knowledge for generalization, an effective embedding from feature subsets to continuous space along with joint optimizing reconstruction and accuracy losses to select accurate features. Experimental results demonstrate the effectiveness of the proposed method.
翻译:特征选择(FS),如过滤、封装和嵌入方法,旨在为给定的下游任务找到最优的特征子集。然而,在许多实际应用中:1)特征选择的标准因领域而异;2)当数据具有高维度和少量样本时,特征选择较为脆弱。选定的特征子集能否更具泛化性、更准确且对输入维度不敏感?我们将此问题概括为深度可微特征选择任务,并提出一个新视角:将离散特征子集选择视为连续嵌入空间优化。我们开发了一个通用且规范化的框架,包括深度特征子集编码器、准确性评估器、解码器和梯度上升优化器。该框架包含四个步骤:1)特征-准确性训练数据准备;2)深度特征子集嵌入;3)梯度优化搜索;4)特征子集重建。我们提出了新的技术见解:将强化学习作为训练数据生成器,集成多样化的同行和探索性特征选择器知识以增强泛化能力,通过特征子集到连续空间的有效嵌入并联合优化重建损失和准确性损失来选择准确特征。实验结果证明了所提方法的有效性。