This paper introduces a novel approach to active feature acquisition for classification, which is the task of sequentially selecting the most informative subset of features to achieve optimal prediction performance during testing while minimizing cost. The proposed approach involves a new lazy model that is significantly faster and more efficient compared to existing methods, while still producing comparable accuracy results. During the test phase, the proposed approach utilizes Fisher scores for feature ranking to identify the most important feature at each step. In the next step the training dataset is filtered based on the observed value of the selected feature and then we continue this process to reach to acceptable accuracy or limit of the budget for feature acquisition. The performance of the proposed approach was evaluated on synthetic and real datasets, including our new synthetic dataset, CUBE dataset and also real dataset Forest. The experimental results demonstrate that our approach achieves competitive accuracy results compared to existing methods, while significantly outperforming them in terms of speed. The source code of the algorithm is released at github with this link: https://github.com/alimirzaei/FCwSFS.
翻译:本文提出了一种用于分类的主动特征获取新方法,即在测试阶段序贯选择最具信息量的特征子集,以在最小化成本的同时实现最优预测性能。所提方法建立了一种新的惰性模型,相较于现有方法速度更快、效率更高,同时仍能保持相当的准确率。在测试阶段,该方法利用Fisher得分进行特征排序,以在每一步识别出最重要的特征。随后,根据所选特征的观测值对训练数据集进行过滤,并重复此过程直至达到可接受的准确率或特征获取预算上限。在合成数据集(包括我们新构建的CUBE数据集)以及真实数据集(Forest)上评估了该方法的性能。实验结果表明,我们的方法在准确率上与现有方法相当,但在计算速度上显著优于它们。该算法的源代码已发布于GitHub:https://github.com/alimirzaei/FCwSFS。