One of the most popular ML algorithms, AdaBoost, can be derived from the dual of a relative entropy minimization problem subject to the fact that the positive weights on the examples sum to one. Essentially, harder examples receive higher probabilities. We generalize this setup to the recently introduced {\it tempered exponential measure}s (TEMs) where normalization is enforced on a specific power of the measure and not the measure itself. TEMs are indexed by a parameter $t$ and generalize exponential families ($t=1$). Our algorithm, $t$-AdaBoost, recovers AdaBoost~as a special case ($t=1$). We show that $t$-AdaBoost retains AdaBoost's celebrated exponential convergence rate when $t\in [0,1)$ while allowing a slight improvement of the rate's hidden constant compared to $t=1$. $t$-AdaBoost partially computes on a generalization of classical arithmetic over the reals and brings notable properties like guaranteed bounded leveraging coefficients for $t\in [0,1)$. From the loss that $t$-AdaBoost minimizes (a generalization of the exponential loss), we show how to derive a new family of {\it tempered} losses for the induction of domain-partitioning classifiers like decision trees. Crucially, strict properness is ensured for all while their boosting rates span the full known spectrum. Experiments using $t$-AdaBoost+trees display that significant leverage can be achieved by tuning $t$.
翻译:最流行的机器学习算法之一 AdaBoost,可以从正样本权重之和为1约束下的相对熵最小化问题的对偶中推导得出。本质上,困难样本被赋予更高概率。我们将这一框架推广到最近提出的"调谐指数测度"(TEMs)上,其中归一化施加于测度的特定幂次而非测度本身。TEM由参数 $t$ 索引,并推广了指数族($t=1$)。我们的算法 $t$-AdaBoost 将 AdaBoost 作为特例($t=1$)恢复。我们证明,当 $t\in [0,1)$ 时,$t$-AdaBoost 保留了 AdaBoost 著名的指数收敛速率,同时相比 $t=1$ 的情况轻微改善了速率中的隐藏常数。$t$-AdaBoost 部分基于实数上经典算术的推广进行计算,并带来了显著性质,例如对于 $t\in [0,1)$ 保证有界的杠杆系数。从 $t$-AdaBoost 最小化的损失函数(指数损失的推广)出发,我们展示了如何推导出一族新的"调谐"损失,用于诱导领域划分分类器(如决策树)。关键在于,所有调谐损失都保证了严格适定性,同时其提升速率覆盖了整个已知谱系。使用 $t$-AdaBoost+决策树的实验表明,通过调整 $t$ 可以获得显著的杠杆效果。