Data-Centric AI: Deep Generative Differentiable Feature Selection via Discrete Subsetting as Continuous Embedding Space Optimization

Feature Selection (FS), such as filter, wrapper, and embedded methods, aims to find the optimal feature subset for a given downstream task. However, in many real-world practices, 1) the criteria of FS vary across domains; 2) FS is brittle when data is a high-dimensional and small sample size. Can selected feature subsets be more generalized, accurate, and input dimensionality agnostic? We generalize this problem into a deep differentiable feature selection task and propose a new perspective: discrete feature subsetting as continuous embedding space optimization. We develop a generic and principled framework including a deep feature subset encoder, accuracy evaluator, decoder, and gradient ascent optimizer. This framework implements four steps: 1) features-accuracy training data preparation; 2) deep feature subset embedding; 3) gradient-optimized search; 4) feature subset reconstruction. We develop new technical insights: reinforcement as a training data generator, ensembles of diverse peer and exploratory feature selector knowledge for generalization, an effective embedding from feature subsets to continuous space along with joint optimizing reconstruction and accuracy losses to select accurate features. Experimental results demonstrate the effectiveness of the proposed method.

翻译：特征选择（FS），如过滤、封装和嵌入方法，旨在为给定的下游任务找到最优的特征子集。然而，在许多实际应用中：1）特征选择的标准因领域而异；2）当数据具有高维度和少量样本时，特征选择较为脆弱。选定的特征子集能否更具泛化性、更准确且对输入维度不敏感？我们将此问题概括为深度可微特征选择任务，并提出一个新视角：将离散特征子集选择视为连续嵌入空间优化。我们开发了一个通用且规范化的框架，包括深度特征子集编码器、准确性评估器、解码器和梯度上升优化器。该框架包含四个步骤：1）特征-准确性训练数据准备；2）深度特征子集嵌入；3）梯度优化搜索；4）特征子集重建。我们提出了新的技术见解：将强化学习作为训练数据生成器，集成多样化的同行和探索性特征选择器知识以增强泛化能力，通过特征子集到连续空间的有效嵌入并联合优化重建损失和准确性损失来选择准确特征。实验结果证明了所提方法的有效性。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

专知会员服务

19+阅读 · 2022年3月4日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日