In repeated interaction problems with adaptive agents, our objective often requires anticipating and optimizing over the space of possible agent responses. We show that many problems of this form can be cast as instances of online (nonlinear) control which satisfy \textit{local controllability}, with convex losses over a bounded state space which encodes agent behavior, and we introduce a unified algorithmic framework for tractable regret minimization in such cases. When the instance dynamics are known but otherwise arbitrary, we obtain oracle-efficient $O(\sqrt{T})$ regret by reduction to online convex optimization, which can be made computationally efficient if dynamics are locally \textit{action-linear}. In the presence of adversarial disturbances to the state, we give tight bounds in terms of either the cumulative or per-round disturbance magnitude (for \textit{strongly} or \textit{weakly} locally controllable dynamics, respectively). Additionally, we give sublinear regret results for the cases of unknown locally action-linear dynamics as well as for the bandit feedback setting. Finally, we demonstrate applications of our framework to well-studied problems including performative prediction, recommendations for adaptive agents, adaptive pricing of real-valued goods, and repeated gameplay against no-regret learners, directly yielding extensions beyond prior results in each case.
翻译:在涉及自适应智能体的重复交互问题中,我们的目标通常需要预测并优化智能体所有可能的响应空间。本文证明,此类问题大多可转化为满足\textit{局部可控性}的在线(非线性)控制问题实例,其损失函数在描述智能体行为的有界状态空间上呈凸性,并为此类问题提出了统一的算法框架以实现可处理的遗憾最小化。当实例动态特性已知且任意时,通过归约至在线凸优化方法,我们获得了具有预言机效率的$O(\sqrt{T})$遗憾界;若动态特性满足局部\textit{动作线性}条件,该算法可进一步实现计算高效性。在存在状态对抗扰动的情况下,我们根据累积扰动幅度(针对\textit{强}局部可控动态)或每轮扰动幅度(针对\textit{弱}局部可控动态)给出了紧致边界。此外,针对未知的局部动作线性动态场景以及赌博机反馈设置,我们提出了次线性遗憾结果。最后,我们通过执行预测、自适应智能体推荐、实值商品动态定价、以及对抗无遗憾学习者的重复博弈等经典问题,展示了本框架的应用价值,并在每个案例中直接实现了对现有研究结果的拓展。