A matching platform is a system that matches different types of participants, such as companies and job-seekers. In such a platform, merely maximizing the number of matches can result in matches being concentrated on highly popular participants, which may increase dissatisfaction among other participants, such as companies, and ultimately lead to their churn, reducing the platform's profit opportunities. To address this issue, we propose a novel online learning problem, Combinatorial Allocation Bandits (CAB), which incorporates the notion of *arm satisfaction*. In CAB, at each round $t=1,\dots,T$, the learner observes $K$ feature vectors corresponding to $K$ arms for each of $N$ users, assigns each user to an arm, and then observes feedback following a generalized linear model (GLM). Unlike prior work, the learner's objective is not to maximize the number of positive feedback, but rather to maximize the arm satisfaction. For CAB, we provide an upper confidence bound algorithm that achieves an approximate regret upper bound, which matches the existing lower bound for the special case. Furthermore, we propose a TS algorithm and provide an approximate regret upper bound. Finally, we conduct experiments on synthetic data to demonstrate the effectiveness of the proposed algorithms compared to other methods.
翻译:匹配平台是一种连接不同类型参与者(如企业与求职者)的系统。在此类平台中,仅最大化匹配数量可能导致匹配过度集中于高人气参与者,这可能增加其他参与者(如企业)的不满,最终导致其流失,从而降低平台的盈利机会。为解决此问题,我们提出了一种新颖的在线学习问题——组合分配赌博机(CAB),该问题引入了*臂满意度*的概念。在CAB中,每轮$t=1,\dots,T$,学习者观察到$N$个用户各自对应的$K$个臂的$K$个特征向量,将每个用户分配至一个臂,随后观察到遵循广义线性模型(GLM)的反馈。与先前工作不同,学习者的目标并非最大化正面反馈数量,而是最大化臂满意度。针对CAB,我们提出了一种上置信界算法,该算法实现了近似遗憾上界,该上界与特殊情形下的现有下界相匹配。此外,我们提出了一种TS算法并给出了近似遗憾上界。最后,我们在合成数据上进行了实验,以证明所提算法相较于其他方法的有效性。