This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.
翻译:本文提出一种自动构造变量(在回归问题中)的方法,以补充初始输入向量所包含的信息。该方法作为预处理步骤,将待回归变量的连续值离散化为若干区间,进而定义值阈值。随后训练分类器,预测待回归值是否小于或等于每个阈值。分类器的不同输出被拼接成附加变量向量,从而丰富回归问题的初始向量。因此,实现的系统可视为通用预处理工具。我们使用5种回归器测试了所提出的富化方法,并在33个回归数据集上进行了评估。实验结果证实了该方法的有效性。