We study the problem of feature selection in general machine learning (ML) context, which is one of the most critical subjects in the field. Although, there exist many feature selection methods, however, these methods face challenges such as scalability, managing high-dimensional data, dealing with correlated features, adapting to variable feature importance, and integrating domain knowledge. To this end, we introduce the "Adaptive Feature Selection with Binary Masking" (AFS-BM) which remedies these problems. AFS-BM achieves this by joint optimization for simultaneous feature selection and model training. In particular, we do the joint optimization and binary masking to continuously adapt the set of features and model parameters during the training process. This approach leads to significant improvements in model accuracy and a reduction in computational requirements. We provide an extensive set of experiments where we compare AFS-BM with the established feature selection methods using well-known datasets from real-life competitions. Our results show that AFS-BM makes significant improvement in terms of accuracy and requires significantly less computational complexity. This is due to AFS-BM's ability to dynamically adjust to the changing importance of features during the training process, which an important contribution to the field. We openly share our code for the replicability of our results and to facilitate further research.
翻译:我们研究了通用机器学习背景下的特征选择问题,这是该领域最关键的课题之一。尽管已有许多特征选择方法,但这些方法仍面临可扩展性、高维数据处理、相关特征处理、适应特征重要性变化以及融入领域知识等挑战。为此,我们提出了“基于二进制掩码的自适应特征选择”方法,以解决这些问题。AFS-BM通过联合优化实现同步特征选择与模型训练,具体而言,我们采用联合优化与二进制掩码技术,在训练过程中持续调整特征集与模型参数。该方法显著提升了模型精度并降低了计算需求。我们进行了大量实验,将AFS-BM与现有特征选择方法在知名现实竞赛数据集上进行比较。结果表明,AFS-BM在精度方面取得显著提升,且计算复杂度大幅降低。这得益于AFS-BM能够在训练过程中动态适应特征重要性的变化,这对该领域是重要贡献。我们公开了代码以确保结果可复现,并促进后续研究。