The principle of boosting in supervised learning involves combining multiple weak classifiers to obtain a stronger classifier. AdaBoost has the reputation to be a perfect example of this approach. This study analyzes the (two classes) AdaBoost procedure implemented in scikit-learn. This paper shows that AdaBoost is an algorithm in name only, as the resulting combination of weak classifiers can be explicitly calculated using a truth table. Indeed, using a logical analysis of the training set with weak classifiers constructing a truth table, we recover, through an analytical formula, the weights of the combination of these weak classifiers obtained by the procedure. We observe that this formula does not give the point of minimum of the risk, we provide a system to compute the exact point of minimum and we check that the AdaBoost procedure in scikit-learn does not implement the algorithm described by Freund and Schapire.
翻译:监督学习中的提升(boosting)原理涉及将多个弱分类器组合以得到一个更强的分类器。AdaBoost 被视为这一方法的完美范例。本研究分析了 scikit-learn 中实现的(二类)AdaBoost 过程。本文表明,AdaBoost 仅名义上是一种算法,因为通过真值表可以显式计算弱分类器的最终组合结果。实际上,利用弱分类器构建真值表对训练集进行逻辑分析,我们通过一个解析公式恢复了该过程中这些弱分类器组合的权重。我们观察到该公式并未给出风险的最小值点,我们提供了一个计算精确最小值点的系统,并验证了 scikit-learn 中的 AdaBoost 过程并非实现 Freund 和 Schapire 所描述的算法。