We study the dynamics of repeated fair division between two players, Alice and Bob, where Alice partitions a cake into two subsets and Bob chooses his preferred one over $T$ rounds. Alice aims to minimize her regret relative to the Stackelberg value -- the maximum utility she could achieve if she knew Bob's private valuation. We show that if Alice uses arbitrary measurable partitions, achieving strongly sublinear regret is impossible; she suffers a regret of $Ω\Bigl(\frac{T}{\log^2 T}\Bigr)$ regret even against a myopic Bob. However, when Alice uses at most $k$ cuts, the learning landscape becomes tractable. We analyze Alice's performance based on her knowledge of Bob's strategic sophistication (his regret budget). When Bob's learning rate is public, we establish a hierarchy of polynomial regret bounds determined by $k$ and Bob's regret budget. In contrast, when this learning rate is private, Alice can universally guarantee $O\Bigl(\frac{T}{\log T}\Bigr)$ regret, but any attempt to secure a polynomial rate $O(T^β)$ (for $β< 1$) leaves her vulnerable to incurring strictly linear regret against some Bob. Finally, as a corollary of our online learning dynamics, we characterize the randomized query complexity of finding approximate Stackelberg allocations with a constant number of cuts in the Robertson-Webb model.
翻译:我们研究Alice和Bob两名玩家之间重复公平分配的动态过程:Alice将蛋糕划分为两个子集,Bob在$T$轮中每轮选择其偏好的子集。Alice的目标是将其遗憾值(相对于Stackelberg值——即若知晓Bob私有估值时她能获得的最大效用)最小化。研究表明,若Alice采用任意可测划分,则无法实现强次线性遗憾,即使面对短视的Bob,她也将遭受$Ω\Bigl(\frac{T}{\log^2 T}\Bigr)$量级的遗憾。然而,当Alice使用至多$k$刀切割时,学习问题变得可处理。我们根据Alice对Bob策略复杂度(其遗憾预算)的认知来分析其表现。当Bob的学习率为公开信息时,我们建立了由$k$和Bob遗憾预算决定的多项式遗憾界层次结构。相反,当该学习率为私有信息时,Alice可普遍保证$O\Bigl(\frac{T}{\log T}\Bigr)$遗憾,但任何试图获得多项式速率$O(T^β)$(其中$β< 1$)的策略都会使其在面对某些Bob时必然承受严格线性遗憾。最后,作为在线学习动态的推论,我们在Robertson-Webb模型中刻画了使用常数刀数寻找近似Stackelberg分配的随机化查询复杂度。