In this paper we provide a novel mathematical optimization based methodology to perturb the features of a given observation to be re-classified, by a tree ensemble classification rule, to a certain desired class. The method is based on these facts: the most viable changes for an observation to reach the desired class do not always coincide with the closest distance point (in the feature space) of the target class; individuals put effort on a few number of features to reach the desired class; and each individual is endowed with a probability to change each of its features to a given value, which determines the overall probability of changing to the target class. Putting all together, we provide different methods to find the features where the individuals must exert effort to maximize the probability to reach the target class. Our method also allows us to rank the most important features in the tree-ensemble. The proposed methodology is tested on a real dataset, validating the proposal.
翻译:本文提出了一种基于数学优化的创新方法,用于扰动待重分类样本的特征,使其通过树集成分类规则被重新划分至特定目标类别。该方法基于以下事实:观测样本达到目标类别的最可行特征变化并不总是与特征空间中目标类别的最近距离点重合;个体通常仅需改变少数特征即可达到目标类别;每个个体都具备以特定概率将其特征修改为给定值的能力,该概率决定了其最终转变为目标类别的总体概率。综合这些因素,我们提出了多种方法来识别个体需重点调整的特征集合,以最大化其达到目标类别的概率。本方法还能对树集成模型中各特征的重要性进行排序。通过在真实数据集上的实验验证,所提方法的有效性得到了充分证实。