We study the problem of feature selection in general machine learning (ML) context, which is one of the most critical subjects in the field. Although, there exist many feature selection methods, however, these methods face challenges such as scalability, managing high-dimensional data, dealing with correlated features, adapting to variable feature importance, and integrating domain knowledge. To this end, we introduce the ``Adaptive Feature Selection with Binary Masking" (AFS-BM) which remedies these problems. AFS-BM achieves this by joint optimization for simultaneous feature selection and model training. In particular, we do the joint optimization and binary masking to continuously adapt the set of features and model parameters during the training process. This approach leads to significant improvements in model accuracy and a reduction in computational requirements. We provide an extensive set of experiments where we compare AFS-BM with the established feature selection methods using well-known datasets from real-life competitions. Our results show that AFS-BM makes significant improvement in terms of accuracy and requires significantly less computational complexity. This is due to AFS-BM's ability to dynamically adjust to the changing importance of features during the training process, which an important contribution to the field. We openly share our code for the replicability of our results and to facilitate further research.
翻译:本文研究通用机器学习(ML)背景下的特征选择问题,该问题是该领域最关键的课题之一。尽管存在多种特征选择方法,但这些方法仍面临可扩展性、高维数据处理、相关特征管理、可变特征重要性自适应以及领域知识整合等挑战。为此,我们提出"自适应特征选择与二元掩码"(AFS-BM)方法以解决上述问题。AFS-BM通过联合优化实现特征选择与模型训练的同步进行。具体而言,我们利用联合优化与二元掩码机制,在训练过程中持续调整特征集与模型参数。该方法显著提升了模型精度,同时降低了计算需求。我们通过大规模实验对比了AFS-BM与现有特征选择方法在真实竞赛公开数据集上的表现。结果表明,AFS-BM在精度上取得显著提升,且计算复杂度大幅降低。这得益于AFS-BM在训练过程中能动态适应特征重要性的变化,这是对该领域的重要贡献。我们公开共享代码以确保实验结果可复现,并促进后续研究。