Feature selection is popular for obtaining small, interpretable, yet highly accurate prediction models. Conventional feature-selection methods typically yield one feature set only, which might not suffice in some scenarios. For example, users might be interested in finding alternative feature sets with similar prediction quality, offering different explanations of the data. In this article, we introduce alternative feature selection and formalize it as an optimization problem. In particular, we define alternatives via constraints and enable users to control the number and dissimilarity of alternatives. Next, we analyze the complexity of this optimization problem and show NP-hardness. Further, we discuss how to integrate conventional feature-selection methods as objectives. Finally, we evaluate alternative feature selection with 30 classification datasets. We observe that alternative feature sets may indeed have high prediction quality, and we analyze several factors influencing this outcome.
翻译:特征选择常用于获得规模小、可解释性强且预测准确度高的模型。传统特征选择方法通常仅输出一个特征集,这在某些场景下可能无法满足需求。例如,用户可能需要寻找预测质量相近但能提供不同数据解释的替代特征集。本文提出替代特征选择方法,并将其形式化为优化问题。具体而言,我们通过约束条件定义替代方案,使用户能够控制替代方案的数量及差异性。接着分析该优化问题的计算复杂度,证明其NP-hard性。进一步探讨如何将传统特征选择方法作为目标函数进行集成。最后,基于30个分类数据集对替代特征选择方法进行评测,发现替代特征集确实能保持高预测质量,并剖析了影响该结果的多项因素。