Recent years have seen a surge of artificial currency-based mechanisms in contexts where monetary instruments are deemed unfair or inappropriate, e.g., in allocating food donations to food banks, course seats to students, and, more recently, even for traffic congestion management. Yet the applicability of these mechanisms remains limited in repeated auction settings, as it is challenging for users to learn how to bid an artificial currency that has no value outside the auctions. Indeed, users must jointly learn the value of the currency in addition to how to spend it optimally. In this work, we study the problem of learning to bid in two prominent classes of artificial currency auctions: those in which currency, which users spend to obtain public resources, is only issued at the beginning of a finite period; and those where, in addition to the initial currency endowment, currency payments are redistributed to users at each time step. In the latter class, the currency has been referred to as karma, since users do not only spend karma to obtain public resources but also gain karma for yielding them. In both classes, we propose a simple learning strategy, called adaptive karma pacing, and show that this strategy a) is asymptotically optimal for a single user bidding against competing bids drawn from a stationary distribution; b) leads to convergent learning dynamics when all users adopt it; and c) constitutes an approximate Nash equilibrium as the number of users grows. Our results require a novel analysis in comparison to adaptive pacing strategies in monetary auctions, since we depart from the classical assumption that the currency has known value outside the auctions, and moreover consider that the currency is both spent and gained in the class of auctions with redistribution.
翻译:近年来,在货币工具被认为不公平或不合适的场景中,基于人工智能货币的机制激增,例如将食品捐赠分配给食品银行、将课程座位分配给学生,以及最近甚至用于交通拥堵管理。然而,这些机制在重复拍卖环境中的适用性仍然有限,因为用户很难学会如何出价一种在拍卖之外没有价值的人工货币。实际上,用户必须共同学习货币的价值以及如何最优地花费它。在这项工作中,我们研究了在两种主要的人工货币拍卖类别中学习出价的问题:一种是在有限周期开始时仅发行货币,用户花费货币获取公共资源;另一种是除了初始货币配额外,每个时间步还会将货币支付重新分配给用户。在后一类中,货币被称为"因果",因为用户不仅花费因果来获取公共资源,还会因放弃资源而获得因果。在这两类中,我们提出了一种简单的学习策略,称为自适应因果配速,并证明该策略:a) 对于单个用户在对抗服从平稳分布的其他出价时具有渐近最优性;b) 当所有用户采用该策略时,会导致收敛的学习动态;c) 随着用户数量增加,构成近似纳什均衡。与货币拍卖中的自适应配速策略相比,我们的结果需要新颖的分析,因为我们背离了货币在拍卖之外具有已知价值的经典假设,并且进一步考虑了在有再分配的拍卖类别中货币既被花费也被获取。