In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred. Following the recent line of research on algorithms endowed with additional predictive features, we revisit this problem by allowing the decision maker to acquire additional information on the actions to be selected. In particular, we study the power of \emph{best-action queries}, which reveal beforehand the identity of the best action at a given time step. In practice, predictive features may be expensive, so we allow the decision maker to issue at most $k$ such queries. We establish tight bounds on the performance any algorithm can achieve when given access to $k$ best-action queries for different types of feedback models. In particular, we prove that in the full feedback model, $k$ queries are enough to achieve an optimal regret of $\Theta\left(\min\left\{\sqrt T, \frac Tk\right\}\right)$. This finding highlights the significant multiplicative advantage in the regret rate achievable with even a modest (sublinear) number $k \in \Omega(\sqrt{T})$ of queries. Additionally, we study the challenging setting in which the only available feedback is obtained during the time steps corresponding to the $k$ best-action queries. There, we provide a tight regret rate of $\Theta\left(\min\left\{\frac{T}{\sqrt k},\frac{T^2}{k^2}\right\}\right)$, which improves over the standard $\Theta\left(\frac{T}{\sqrt k}\right)$ regret rate for label efficient prediction for $k \in \Omega(T^{2/3})$.
翻译:在线学习中,决策者需反复从一组动作中选择其一,目标是最小化整体损失。沿袭近期关于赋予算法额外预测特征的研究脉络,我们通过允许决策者获取待选动作的附加信息来重新审视这一问题。特别地,我们研究了最优动作查询的能力——这种查询能预先揭示给定时间步的最优动作身份。实践中预测特征可能成本高昂,因此我们允许决策者至多进行$k$次此类查询。针对不同的反馈模型,我们建立了任何算法在获得$k次最优动作查询权限时所能达到性能的紧确界。特别地,我们证明在全反馈模型中,$k$次查询足以实现$\Theta\left(\min\left\{\sqrt T, \frac Tk\right\}\right)$的最优遗憾界。这一发现凸显了即使仅使用适度(亚线性)数量$k \in \Omega(\sqrt{T})$的查询,也能在遗憾率上获得显著的乘性优势。此外,我们研究了仅能通过$k$次最优动作查询对应时间步获取反馈的挑战性场景。在此场景中,我们给出了$\Theta\left(\min\left\{\frac{T}{\sqrt k},\frac{T^2}{k^2}\right\}\right)$的紧确遗憾率,该结果在$k \in \Omega(T^{2/3})$时优于标签高效预测的标准遗憾率$\Theta\left(\frac{T}{\sqrt k}\right)$。