Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. Classical notions of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) are unfortunately of limited value in the learning setting, given that preferences are inherently uncertain and destabilizing while they are being learned. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that "optimism in the face of uncertainty," the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
翻译:大规模双边匹配平台必须寻找与用户偏好一致的市场结果,同时从数据中学习这些偏好。经典稳定性概念(Gale与Shapley, 1962;Shapley与Shubik, 1971)在学习环境下价值有限,因为偏好本质上是不确定的,学习过程中会破坏稳定性。为弥合这一差距,我们开发了在不确定性下学习稳定市场结果的框架与算法。我们的核心场景涉及可转移效用匹配,平台同时匹配主体并设定双方之间的货币转移。我们设计了一种激励感知的学习目标,用于衡量市场结果与均衡的距离。基于该目标,我们分析了学习复杂度与偏好结构的关系,将学习问题转化为随机多臂老虎机问题。在算法层面,我们证明了“面对不确定性保持乐观”——这一多臂老虎机算法的基础原理——可应用于带转移匹配的原对偶形式,并产生近似的遗憾上界。本研究为揭示在大规模数据驱动市场中稳定匹配何时及如何产生迈出了第一步。