Is it possible to make online decisions when personalized covariates are unavailable? We take a collaborative-filtering approach for decision-making based on collective preferences. By assuming low-dimensional latent features, we formulate the covariate-free decision-making problem as a matrix completion bandit. We propose a policy learning procedure that combines an $\varepsilon$-greedy policy for decision-making with an online gradient descent algorithm for bandit parameter estimation. Our novel two-phase design balances policy learning accuracy and regret performance. For policy inference, we develop an online debiasing method based on inverse propensity weighting and establish its asymptotic normality. Our methods are applied to data from the San Francisco parking pricing project, revealing intriguing discoveries and outperforming the benchmark policy.
翻译:当个性化协变量不可用时,是否可能进行在线决策?我们采用协同过滤方法,基于集体偏好进行决策。通过假设存在低维潜在特征,我们将无协变量决策问题建模为矩阵补全赌博机问题。我们提出一种策略学习流程,该流程将用于决策的ε-贪婪策略与用于赌博机参数估计的在线梯度下降算法相结合。我们新颖的两阶段设计平衡了策略学习精度与遗憾性能。对于策略推断,我们基于逆倾向加权开发了一种在线去偏方法,并建立了其渐近正态性。我们的方法应用于旧金山停车定价项目数据,揭示了引人入胜的发现,且性能优于基准策略。