For each of $T$ time steps, $m$ experts report probability distributions over $n$ outcomes; we wish to learn to aggregate these forecasts in a way that attains a no-regret guarantee. We focus on the fundamental and practical aggregation method known as logarithmic pooling -- a weighted average of log odds -- which is in a certain sense the optimal choice of pooling method if one is interested in minimizing log loss (as we take to be our loss function). We consider the problem of learning the best set of parameters (i.e. expert weights) in an online adversarial setting. We assume (by necessity) that the adversarial choices of outcomes and forecasts are consistent, in the sense that experts report calibrated forecasts. Imposing this constraint creates a (to our knowledge) novel semi-adversarial setting in which the adversary retains a large amount of flexibility. In this setting, we present an algorithm based on online mirror descent that learns expert weights in a way that attains $O(\sqrt{T} \log T)$ expected regret as compared with the best weights in hindsight.
翻译:在 $T$ 个时间步中,每个时间步有 $m$ 个专家报告关于 $n$ 个结果的概率分布;我们希望学习以无遗憾保证的方式聚合这些预测。我们聚焦于一种基础且实用的聚合方法,即对数池化——对对数几率进行加权平均——在某种意义上是若着眼于最小化对数损失(我们以此作为损失函数)时的最优池化方法选择。我们考虑在在线对抗环境下学习最佳参数集(即专家权重)的问题。我们假设(出于必要性)对抗性选择的结果与预测是一致的,即专家报告了校准后的预测。施加这一约束创造了一种(据我们所知)新颖的半对抗环境,其中对抗者保留了很大的灵活性。在此环境下,我们提出了一种基于在线镜像下降的算法,该算法以与事后最佳权重相比达到 $O(\sqrt{T} \log T)$ 期望遗憾的方式学习专家权重。