In current ML field models are getting larger and more complex, data we use are also getting larger in quantity and higher in dimension, so in order to train better models, save training time and computational resources, a good Feature Selection (FS) method in preprocessing stage is necessary. Feature importance (FI) is of great importance since it is the basis of feature selection. This paper creatively introduces the calculation of PNS(the probability of Necessity and Sufficiency) in Causality into quantifying feature importance and creates new FI measuring methods: PN-FI, which means how much importance a feature has in image recognition tasks, PS_FI that means how much importance a feature has in image generating tasks, and PNS_FI which measures both. The main body of this paper is three RCTs, with whose results we show how PS_FI, PN_FI and PNS_FI of three features: dog nose, dog eyes and dog mouth are calculated. The FI values are intervals with tight upper and lower bounds.
翻译:在当前机器学习领域,模型规模不断增大且复杂度持续提升,所使用的数据也在数量上日益庞大、维度上不断增高。因此,为了训练更优模型、节省训练时间与计算资源,在预处理阶段采用有效的特征选择方法至关重要。特征重要性作为特征选择的基础,具有极其重要的意义。本文创新性地将因果推断中的必要性概率与充分性概率引入特征重要性量化,提出了三种新的特征重要性度量方法:PN-FI(衡量特征在图像识别任务中的重要程度)、PS-FI(衡量特征在图像生成任务中的重要程度),以及PNS-FI(综合衡量两方面重要性)。本文主体包含三项随机对照试验,通过试验结果展示了狗鼻子、狗眼睛和狗嘴巴这三个特征的PS-FI、PN-FI与PNS-FI值的计算过程。所得特征重要性值为具有严格上下界约束的区间值。