On Randomized Algorithms in Online Strategic Classification

Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such strategic behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal{H}) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We also provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give a proper learner using convex optimization techniques to improve the regret upper bound to $O(\sqrt{T \log |\mathcal{H}|} + |\mathcal{H}| \log(T|\mathcal{H}|))$. We show a matching lower bound up to logarithmic factors for all proper learning rules, demonstrating the optimality of our learner among proper learners. As such, improper learning is necessary to further improve regret guarantees.

翻译：在线战略分类研究智能体为获得有利预测而策略性修改其特征的场景。例如，给定一个基于信用评分决定贷款审批的分类器，申请人可能通过开设或关闭信用卡及银行账户来获取正向预测。学习目标是在存在此类策略行为的情况下实现较低的错误率或遗憾界。虽然随机化算法在战略环境中可能为学习者带来优势，但目前对此研究尚不充分。在可实现场景中，随机化算法尚无已知下界，且确定性学习者的现有下界构造可通过随机化规避。在不可知场景中，当前最佳遗憾上界为$O(T^{3/4}\log^{1/4}T|\mathcal H|)$，与标准在线学习速率$O(\sqrt{T\log|\mathcal H|})$存在显著差距。本研究中，我们对两种场景下的在线战略分类提供了更精细的界。在可实现场景中，当$T > \mathrm{Ldim}(\mathcal{H}) Δ^2$时，我们将确定性学习者的现有下界$Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$扩展至所有学习者，从而得到首个适用于随机化学习者的下界。同时，我们提出了首个能改进已知（确定性）上界$O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$的随机化学习者。在不可知场景中，我们通过凸优化技术构建了一个恰当学习者，将遗憾上界改进为$O(\sqrt{T \log |\mathcal{H}|} + |\mathcal{H}| \log(T|\mathcal{H}|))$。我们证明了所有恰当学习规则在忽略对数因子下的匹配下界，从而验证了我们的学习者在恰当学习者中的最优性。这表明要进一步提升遗憾保证，必须采用非恰当学习方法。