We consider the problem of learning stable matchings in a fully decentralized and uncoordinated manner. In this problem, there are $n$ men and $n$ women, each having preference over the other side. It is assumed that women know their preferences over men, but men are not aware of their preferences over women, and they only learn them if they propose and successfully get matched to women. A matching is called stable if no man and woman prefer each other over their current matches. When all the preferences are known a priori, the celebrated Deferred-Acceptance algorithm proposed by Gale and Shapley provides a decentralized and uncoordinated algorithm to obtain a stable matching. However, when the preferences are unknown, developing such an algorithm faces major challenges due to a lack of coordination. We achieve this goal by making a connection between stable matchings and learning Nash equilibria (NE) in noncooperative games. First, we provide a complete information game formulation for the stable matching problem with known preferences such that its set of pure NE coincides with the set of stable matchings, while its mixed NE can be rounded in a decentralized manner to a stable matching. Relying on such a game-theoretic formulation, we show that for hierarchical markets, adopting the exponential weight (EXP) learning algorithm for the stable matching game achieves logarithmic regret with polynomial dependence on the number of players, thus answering a question posed in previous literature. Moreover, we show that the same EXP learning algorithm converges locally and exponentially fast to a stable matching in general matching markets. We complement this result by introducing another decentralized and uncoordinated learning algorithm that globally converges to a stable matching with arbitrarily high probability, leveraging the weak acyclicity property of the stable matching game.
翻译:我们研究以完全去中心化且无协调的方式学习稳定匹配的问题。在该问题中,存在$n$位男性与$n$位女性,每一方对另一方均具有偏好。假设女性知晓自身对男性的偏好,但男性并不了解自身对女性的偏好,仅当提出匹配请求并成功与女性匹配后才能获知该偏好。若不存在任何男性和女性彼此更偏好对方而非当前匹配对象,则该匹配被称为稳定的。当所有偏好均先验已知时,Gale与Shapley提出的经典延迟接受算法提供了一种获取稳定匹配的去中心化且无协调的算法。然而,当偏好未知时,由于缺乏协调机制,开发此类算法面临重大挑战。我们通过建立稳定匹配与非合作博弈中纳什均衡学习之间的联系来实现这一目标。首先,我们为偏好已知的稳定匹配问题构建了一个完全信息博弈模型,使得其纯纳什均衡集合与稳定匹配集合重合,而其混合纳什均衡可通过去中心化方式调整为稳定匹配。基于该博弈论模型,我们证明在层级化市场中,对稳定匹配博弈采用指数权重学习算法可实现对数遗憾,且其多项式依赖玩家数量,从而回答了先前文献中提出的一个问题。此外,我们证明同一指数权重学习算法在一般匹配市场中能以局部指数级速度收敛至稳定匹配。我们进一步引入另一种去中心化且无协调的学习算法作为补充,该算法利用稳定匹配博弈的弱非循环性,能以任意高概率全局收敛至稳定匹配。