Fast Feature Selection with Fairness Constraints

We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work. Furthermore, our extension allows the use of downward-closed constraints, which can be used to encode certain fairness criteria into the feature selection process. We prove strong approximation guarantees for the algorithm based on standard assumptions. These guarantees are applicable to many parametric models, including Generalized Linear Models. Finally, we demonstrate empirically that the proposed algorithm competes favorably with state-of-the-art techniques for feature selection, on real-world and synthetic datasets.

翻译：我们研究模型构建中最优特征选择这一基本问题。即使采用贪心算法变体，该问题在大规模数据集上仍具有计算挑战性。为应对这一挑战，我们将近期针对子模函数贪心前向选择提出的自适应查询模型，扩展至更快速的非子模函数正交匹配追踪范式。所提算法在自适应查询模型中实现了指数级的并行运行时间加速，性能显著优于先前工作。此外，我们的扩展支持向下闭合约束，可用于将特定公平性标准编码至特征选择过程。我们基于标准假设证明了该算法具有强逼近保证，这些保证适用于包括广义线性模型在内的诸多参数化模型。最后，通过真实与合成数据集的实证研究，证明所提算法在特征选择任务中与最先进技术相比具有竞争力。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

专知会员服务

128+阅读 · 2020年11月20日