Two-sided online matching platforms are employed in various markets. However, agents' preferences in the current market are usually implicit and unknown, thus needing to be learned from data. With the growing availability of dynamic side information involved in the decision process, modern online matching methodology demands the capability to track shifting preferences for agents based on contextual information. This motivates us to propose a novel framework for this dynamic online matching problem with contextual information, which allows for dynamic preferences in matching decisions. Existing works focus on online matching with static preferences, but this is insufficient: the two-sided preference changes as soon as one side's contextual information updates, resulting in non-static matching. In this paper, we propose a dynamic matching bandit algorithm to adapt to this problem. The key component of the proposed dynamic matching algorithm is an online estimation of the preference ranking with a statistical guarantee. Theoretically, we show that the proposed dynamic matching algorithm delivers an agent-optimal stable matching result with high probability. In particular, we prove a logarithmic regret upper bound $\mathcal{O}(\log(T))$ and construct a corresponding instance-dependent matching regret lower bound. In the experiments, we demonstrate that dynamic matching algorithm is robust to various preference schemes, dimensions of contexts, reward noise levels, and context variation levels, and its application to a job-seeking market further demonstrates the practical usage of the proposed method.
翻译:双边在线匹配平台广泛应用于各类市场。然而,当前市场中参与主体的偏好通常是隐式且未知的,因此需要从数据中学习。随着决策过程中动态辅助信息的日益丰富,现代在线匹配方法需要具备基于情境信息追踪主体偏好变化的能力。这促使我们为具有情境信息的动态在线匹配问题提出一个新颖框架,该框架允许匹配决策中的动态偏好。现有研究主要关注静态偏好下的在线匹配,但这存在不足:当任意一方的情境信息更新时,双边偏好即发生变化,从而导致非静态匹配。本文提出一种动态匹配赌博机算法来适应此问题。该动态匹配算法的核心组件是具有统计保证的偏好排序在线估计器。理论上,我们证明所提出的动态匹配算法能够以高概率实现主体最优的稳定匹配结果。特别地,我们证明了其对数级遗憾上界 $\mathcal{O}(\log(T))$,并构建了相应的实例依赖性匹配遗憾下界。实验表明,动态匹配算法对多种偏好模式、情境维度、奖励噪声水平和情境变化水平均具有鲁棒性,其在求职市场中的应用进一步验证了该方法的实用价值。