Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully \textit{defining} it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the \textit{original} boosting blueprint's requirements? We provide a constructive formal answer essentially showing that \textit{any} loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.
翻译:提升是一种极为成功的机器学习衍生优化框架,其要求基于对弱学习器预言机的访问,计算高效地学习任意精度的模型,该预言机提供的分类器性能至少略优于随机猜测。与基于梯度的优化相比,一个关键差异在于提升的原始模型并不要求获取损失函数的一阶信息,然而长达数十年的提升研究历史迅速将其演变为一阶优化框架——有时甚至错误地将其\textit{定义}为此类框架。鉴于近期研究进展已将基于梯度的优化扩展至仅利用损失函数的零阶($0^{th}$ order)信息进行学习,这引出了一个核心问题:哪些损失函数可通过提升框架高效优化?提升真正需要何种信息才能满足\textit{原始}提升蓝图的要求?我们通过构造性形式证明给出了答案,本质上表明\textit{任意}损失函数均可通过提升进行优化,这意味着提升能够实现经典零阶优化框架中尚未被证实可行的壮举,因为损失函数无需满足凸性、可微性或Lipschitz连续性要求——事实上甚至无需满足连续性条件。我们所使用的部分工具源于量子微积分这一数学领域(切勿与量子计算混淆),该领域研究不通过极限过程的微积分,因而无需使用一阶信息。