Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully defining it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the original boosting blueprint's requirements? We provide a constructive formal answer essentially showing that any loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.
翻译:提升是一种极为成功的机器学习优化框架,其要求基于对弱学习器预言机的访问,通过计算高效的方式学习任意好的模型,该预言机提供的分类器性能至少略优于随机猜测。与基于梯度的优化方法的关键区别在于,提升的原始模型并不要求获取损失函数的一阶信息,然而长达数十年的提升研究历史已迅速将其演变为一阶优化框架——有时甚至错误地将其定义为此类框架。鉴于近期研究进展已将基于梯度的优化方法扩展至仅利用损失函数的零阶信息进行学习,这引出了一个核心问题:哪些损失函数可以通过提升方法高效优化?为满足原始提升框架的要求,究竟需要何种信息?我们通过构造性形式证明给出了答案,本质上表明任意损失函数均可通过提升方法进行优化,因此提升能够实现经典零阶优化框架中尚未被证实可行的目标,因为损失函数无需满足凸性、可微性或Lipschitz连续性要求——事实上甚至无需满足连续性条件。我们所使用的部分工具源于量子微积分这一数学领域(切勿与量子计算混淆),该领域研究不通过极限过程、因而无需使用一阶信息的微积分方法。