In this paper, we propose a new algorithm for addressing the problem of matching markets with complementary preferences, where agents' preferences are unknown a priori and must be learned from data. The presence of complementary preferences can lead to instability in the matching process, making this problem challenging to solve. To overcome this challenge, we formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm. The algorithm combines the strengths of Thompson Sampling for exploration with a double matching technique to achieve a stable matching outcome. Our theoretical analysis demonstrates the effectiveness of MMTS as it is able to achieve stability at every matching step, satisfies the incentive-compatibility property, and has a sublinear Bayesian regret over time. Our approach provides a useful method for addressing complementary preferences in real-world scenarios.
翻译:在本文中,我们提出了一种新算法,用于解决具有互补偏好的匹配市场问题,其中代理的偏好先验未知,必须从数据中学习。互补偏好的存在可能导致匹配过程不稳定,使得该问题难以求解。为克服这一挑战,我们将该问题建模为多臂赌博机学习框架,并提出多智能体多类型汤普森采样(MMTS)算法。该算法结合了汤普森采样在探索方面的优势与双向匹配技术,以实现稳定的匹配结果。理论分析表明,MMTS算法能在每一步匹配中保持稳定性,满足激励相容性质,且贝叶斯遗憾随时间呈次线性增长。我们的方法为解决现实场景中的互补偏好问题提供了一种有效途径。