确定性苹果品尝 (Deterministic Apple Tasting)

In binary ($0/1$) online classification with apple tasting feedback, the learner receives feedback only when predicting $1$. Besides some degenerate learning tasks, all previously known learning algorithms for this model are randomized. Consequently, prior to this work it was unknown whether deterministic apple tasting is generally feasible. In this work, we provide the first widely-applicable deterministic apple tasting learner, and show that in the realizable case, a hypothesis class is learnable if and only if it is deterministically learnable, confirming a conjecture of [Raman, Subedi, Raman, Tewari-24]. Quantitatively, we show that every class $\mathcal{H}$ is learnable with mistake bound $O \left(\sqrt{\mathtt{L}(\mathcal{H}) T \log T} \right)$ (where $\mathtt{L}(\mathcal{H})$ is the Littlestone dimension of $\mathcal{H}$), and that this is tight for some classes. We further study the agnostic case, in which the best hypothesis makes at most $k$ many mistakes, and prove a trichotomy stating that every class $\mathcal{H}$ must be either easy, hard, or unlearnable. Easy classes have (both randomized and deterministic) mistake bound $\Theta_{\mathcal{H}}(k)$. Hard classes have randomized mistake bound $\tilde{\Theta}_{\mathcal{H}} \left(k + \sqrt{T} \right)$, and deterministic mistake bound $\tilde{\Theta}_{\mathcal{H}} \left(\sqrt{k \cdot T} \right)$, where $T$ is the time horizon. Unlearnable classes have (both randomized and deterministic) mistake bound $\Theta(T)$. Our upper bound is based on a deterministic algorithm for learning from expert advice with apple tasting feedback, a problem interesting in its own right. For this problem, we show that the optimal deterministic mistake bound is $\Theta \left(\sqrt{T (k + \log n)} \right)$ for all $k$ and $T \leq n \leq 2^T$, where $n$ is the number of experts.

翻译：在具有苹果品尝反馈的二元（0/1）在线分类中，学习者仅在预测为$1$时获得反馈。除了一些退化的学习任务外，此前已知的该模型学习算法均为随机化算法。因此，在本工作之前，确定性苹果品尝是否普遍可行尚属未知。在本工作中，我们提出了第一个广泛适用的确定性苹果品尝学习器，并证明在可实现情形下，一个假设类是可学习的当且仅当它是确定性可学习的，这证实了[Raman, Subedi, Raman, Tewari-24]的猜想。在定量分析上，我们证明每个类$\mathcal{H}$均可以$O \left(\sqrt{\mathtt{L}(\mathcal{H}) T \log T} \right)$的错误界学习（其中$\mathtt{L}(\mathcal{H})$是$\mathcal{H}$的Littlestone维度），并且该界对某些类是最优的。我们进一步研究了不可知情形，其中最佳假设至多犯$k$个错误，并证明了一个三分定理：每个类$\mathcal{H}$必须是易学的、难学的或不可学习的。易学类具有（随机化和确定性）错误界$\Theta_{\mathcal{H}}(k)$。难学类具有随机化错误界$\tilde{\Theta}_{\mathcal{H}} \left(k + \sqrt{T} \right)$和确定性错误界$\tilde{\Theta}_{\mathcal{H}} \left(\sqrt{k \cdot T} \right)$，其中$T$是时间范围。不可学习类具有（随机化和确定性）错误界$\Theta(T)$。我们的上界基于一种从专家建议中学习并带有苹果品尝反馈的确定性算法，该问题本身具有独立研究价值。对于该问题，我们证明对于所有$k$和$T \leq n \leq 2^T$，最优确定性错误界为$\Theta \left(\sqrt{T (k + \log n)} \right)$，其中$n$是专家数量。