We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let $\gamma$ be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for $\tilde{O}(1 / \gamma^2)$ rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for $\Omega(1 / \gamma)$ rounds or incurs an $\exp(d / \gamma)$ blow-up in the complexity of training, where $d$ is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has $\Omega(1 / \gamma^2)$ rounds of interaction or incurs a smaller exponential blow-up of $\exp(d)$. -Complementing our lower bound, we show that there exists a boosting algorithm using $\tilde{O}(1/(t \gamma^2))$ rounds, and only suffer a blow-up of $\exp(d \cdot t^2)$. Plugging in $t = \omega(1)$, this shows that the smaller blow-up in our lower bound is tight. More interestingly, this provides the first trade-off between the parallelism and the total work required for boosting.
翻译:我们研究了弱学习到强学习Boosting算法并行化的成本,这一工作延续了Karbasi和Larsen近期的研究。我们的主要结果有两方面:首先,我们证明了一个紧的下界,表明即使Boosting进行"轻微"并行化,也会导致训练复杂度出现指数级增长。具体来说,设$\gamma$为弱学习器相对于随机猜测的优势。著名的\textsc{AdaBoost}算法通过与弱学习器交互$\tilde{O}(1 / \gamma^2)$轮(每轮运行多项式时间)即可生成准确假设。Karbasi和Larsen曾证明"显著"并行化必然导致指数级增长:任何Boosting算法要么与弱学习器交互至少$\Omega(1 / \gamma)$轮,要么训练复杂度产生$\exp(d / \gamma)$的指数级增长,其中$d$为假设类的VC维。我们通过证明任何Boosting算法要么交互轮数至少为$\Omega(1 / \gamma^2)$,要么只需承受更小的指数级增长$\exp(d)$,从而填补了这一间隙。其次,作为下界的补充,我们证明存在一种Boosting算法,仅需$\tilde{O}(1/(t \gamma^2))$轮交互,而代价仅为$\exp(d \cdot t^2)$的指数级增长。当令$t = \omega(1)$时,这表明我们下界中更小的指数级增长是紧的。更有趣的是,这首次揭示了Boosting中并行度与总工作量之间的权衡关系。