Two-sided matching platforms rely on preferences from both sides, yet participants can evaluate only a small fraction of potential partners. In practice, they use low-cost pre-match screening, e.g., interviews, profile views, or trial tasks, to form noisy impressions before committing to applications and offers. We study bandit learning in matching markets with interviews, modeling these interactions as queried \emph{hints}~\citep{DBLP:conf/innovations/BhaskaraGIKM23} that reveal partial preference information to both sides while constraining subsequent applications. Our framework also allows firm-side uncertainty: firms, like agents, learn their preferences and may make early hiring mistakes. To address this, we introduce strategic deferral, a firm-side action that permits temporary vacancy, corrects premature commitments, and enables decentralized learning under coarse anonymous feedback. We design algorithms for centralized and decentralized markets and show that a constant number of interviews per round suffices for horizon-independent regret, improving over the $O(\log T)$ guarantees known without interviews. Our bounds are near-optimal: the centralized guarantee is within a factor $m$ of an information-theoretic lower bound, while decentralized algorithms match it up to polynomial factors in structured markets and remain horizon-independent in general markets.
翻译:双面匹配平台依赖双方的偏好,但每位参与者通常只能评估潜在合作者中的一小部分。实践中,平台会采用低成本匹配前筛查手段(如面试、简历查看或试用任务)来形成带有噪声的印象,随后才进行正式申请与录用。本研究针对包含面试环节的匹配市场中的赌博机学习问题,将此类交互建模为查询型*提示*(hints),其机制如同文献[^1]所述,这类提示在向双方揭示部分偏好信息的同时,也限制了后续申请行为。我们的框架同时纳入企业端的不确定性:企业如同求职者一样,需要实时学习自身偏好,并可能做出过早录用决策。为解决这一问题,我们引入战略性递延策略——允许企业保留职位空缺、修正过早承诺,并在粗粒度匿名反馈下实现去中心化学习。我们分别为中心化与去中心化市场设计了算法,证明每轮仅需常数次面试即可实现与时间范围无关的遗憾界,显著优于已知无面试场景下的$O(\log T)$保证。此边界接近最优:中心化算法的界与信息论下界仅差$m$倍,而去中心化算法在结构化市场中至多相差多项式因子,且在一般市场中仍保持时间范围无关性。