On Randomized Algorithms in Online Strategic Classification

Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings; our bounds depend on the Littlestone dimension $\mathrm{Ldim}(\mathcal H)$ of the hypothesis class $\mathcal H$ and the maximum degree $Δ$ of the manipulation graph. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal H) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal H) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We then provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give an improper randomized learner that improves the regret upper bound to $O(\sqrt{T\log|\mathcal H|})$, matching the standard online learning rate. We also show a larger lower bound for all proper learning rules, demonstrating that improperness is necessary to achieve the optimal rate.

翻译：在线策略分类研究代理为获得有利预测而策略性修改其特征的场景。例如，给定一个基于信用评分确定贷款审批的分类器，申请人可能通过开设或关闭信用卡和银行账户来获得正面预测。学习目标是在此类行为下实现低的错误率或遗憾界。尽管随机化算法在策略性场景中可能为学习者提供优势，但它们在很大程度上尚未被充分探索。在可实现设定中，随机化算法没有已知的下界，且针对确定性学习器的现有下界构造可通过随机化规避。在不可知设定中，已知的最佳遗憾上界为 $O(T^{3/4}\log^{1/4}T|\mathcal H|)$，远低于标准在线学习率 $O(\sqrt{T\log|\mathcal H|})$。在这项工作中，我们为两种设定下的在线策略分类提供了改进的界；这些界依赖于假设类 $\mathcal H$ 的Littlestone维度 $\mathrm{Ldim}(\mathcal H)$ 和操作图的最大度数 $Δ$。在可实现设定中，对于 $T > \mathrm{Ldim}(\mathcal H) Δ^2$，我们将确定性学习器的现有下界 $Ω(\mathrm{Ldim}(\mathcal H) Δ)$ 扩展至所有学习器，首次给出了适用于随机化学习器的下界。随后，我们提出了第一个随机化学习器，改进了已知的（确定性）上界 $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$。在不可知设定中，我们给出一个非适定随机化学习器，将遗憾上界改进至 $O(\sqrt{T\log|\mathcal H|})$，匹配标准在线学习率。我们还展示了所有适定学习规则的更大下界，表明非适定性是实现最优速率所必需的。