Matching markets face increasing needs to learn the matching qualities between demand and supply for effective design of matching policies. In practice, the matching rewards are high-dimensional due to the growing diversity of participants. We leverage a natural low-rank matrix structure of the matching rewards in these two-sided markets, and propose to utilize matrix completion to accelerate reward learning with limited offline data. A unique property for matrix completion in this setting is that the entries of the reward matrix are observed with matching interference -- i.e., the entries are not observed independently but dependently due to matching or budget constraints. Such matching dependence renders unique technical challenges, such as sub-optimality or inapplicability of the existing analytical tools in the matrix completion literature, since they typically rely on sample independence. In this paper, we first show that standard nuclear norm regularization remains theoretically effective under matching interference. We provide a near-optimal Frobenius norm guarantee in this setting, coupled with a new analytical technique. Next, to guide certain matching decisions, we develop a novel ``double-enhanced'' estimator, based off the nuclear norm estimator, with a near-optimal entry-wise guarantee. Our double-enhancement procedure can apply to broader sampling schemes even with dependence, which may be of independent interest. Additionally, we extend our approach to online learning settings with matching constraints such as optimal matching and stable matching, and present improved regret bounds in matrix dimensions. Finally, we demonstrate the practical value of our methods using both synthetic data and real data of labor markets.
翻译:匹配市场日益需要学习供需双方的匹配质量,以有效设计匹配策略。实践中,由于参与者多样性不断增加,匹配奖励呈现高维特性。我们利用这类双边市场中匹配奖励天然的低秩矩阵结构,提出运用矩阵补全技术,借助有限的离线数据加速奖励学习。此场景下矩阵补全的一个独特性质是:奖励矩阵的条目是在匹配干扰下被观测的——即条目并非独立观测,而是因匹配约束或预算约束而存在依赖性。这种匹配依赖性带来了独特的技术挑战,例如矩阵补全文献中现有分析工具(通常依赖样本独立性假设)的次优性或不适用性。本文首先证明,在匹配干扰下,标准的核范数正则化方法在理论上依然有效。我们结合新的分析技术,为此场景提供了近乎最优的Frobenius范数保证。其次,为指导特定匹配决策,我们在核范数估计器基础上提出一种新颖的“双重增强”估计器,该估计器具有近乎最优的逐条目保证。我们的双重增强过程可适用于更广泛的采样方案(即使存在依赖性),这本身可能具有独立的研究价值。此外,我们将方法扩展至具有匹配约束(如最优匹配与稳定匹配)的在线学习场景,并给出了在矩阵维度上改进的遗憾界。最后,我们通过合成数据与劳动力市场真实数据验证了所提方法的实用价值。