Double Auction enables decentralized transfer of goods between multiple buyers and sellers, thus underpinning functioning of many online marketplaces. Buyers and sellers compete in these markets through bidding, but do not often know their own valuation a-priori. As the allocation and pricing happens through bids, the profitability of participants, hence sustainability of such markets, depends crucially on learning respective valuations through repeated interactions. We initiate the study of Double Auction markets under bandit feedback on both buyers' and sellers' side. We show with confidence bound based bidding, and `Average Pricing' there is an efficient price discovery among the participants. In particular, the regret on combined valuation of the buyers and the sellers -- a.k.a. the social regret -- is $O(\log(T)/\Delta)$ in $T$ rounds, where $\Delta$ is the minimum price gap. Moreover, the buyers and sellers exchanging goods attain $O(\sqrt{T})$ regret, individually. The buyers and sellers who do not benefit from exchange in turn only experience $O(\log{T}/ \Delta)$ regret individually in $T$ rounds. We augment our upper bound by showing that $\omega(\sqrt{T})$ individual regret, and $\omega(\log{T})$ social regret is unattainable in certain Double Auction markets. Our paper is the first to provide decentralized learning algorithms in a two-sided market where \emph{both sides have uncertain preference} that need to be learned.
翻译:双重拍卖实现了多个买家与卖家之间的去中心化商品转移,从而支撑了许多在线市场的运作。买家与卖家通过竞价在此类市场中竞争,但通常事先并不知晓自身的估值。由于分配和定价通过竞价完成,参与者的盈利能力(进而市场的可持续性)关键取决于通过重复互动学习各自估值的能力。我们首次研究了在买家与卖家双方均存在赌博反馈(bandit feedback)条件下的双重拍卖市场。我们证明,基于置信区间(confidence bound)的竞价策略和"平均定价"(Average Pricing)可促使参与者高效发现价格。具体而言,在T轮博弈中,买家与卖家联合估值的社会遗憾(social regret)为$O(\log(T)/\Delta)$,其中$\Delta$为最小价格间隙。此外,进行商品交换的买家与卖家个体遗憾可达$O(\sqrt{T})$,而未从交换中获益的买家与卖家在T轮中的个体遗憾仅为$O(\log{T}/ \Delta)$。我们通过证明某些双重拍卖市场中$\omega(\sqrt{T})$的个体遗憾和$\omega(\log{T})$的社会遗憾不可实现,增强了上界结果的约束性。本文首次提出了在双方偏好均不确定且需学习的双边市场中适用的去中心化学习算法。