We study contextual linear bandit problems under feature uncertainty; they are noisy with missing entries. To address the challenges of the noise, we analyze Bayesian oracles given observed noisy features. Our Bayesian analysis finds that the optimal hypothesis can be far from the underlying realizability function, depending on the noise characteristics, which are highly non-intuitive and do not occur for classical noiseless setups. This implies that classical approaches cannot guarantee a non-trivial regret bound. Therefore, we propose an algorithm that aims at the Bayesian oracle from observed information under this model, achieving $\tilde{O}(d\sqrt{T})$ regret bound when there is a large number of arms. We demonstrate the proposed algorithm using synthetic and real-world datasets.
翻译:我们研究特征存在不确定性(即含噪声及缺失项)的上下文线性赌博机问题。为应对噪声挑战,我们分析了基于观测噪声特征的贝叶斯最优决策。贝叶斯分析发现,最优假设可能严重偏离潜在可实现性函数——其偏离程度取决于噪声特性,这一现象高度反直觉且不会出现在经典无噪声场景中。这意味着经典方法无法保证非平凡遗憾界。为此,我们提出一种在模型框架下利用观测信息逼近贝叶斯最优决策的算法,在臂数量较大时实现了$\tilde{O}(d\sqrt{T})$的遗憾界。通过合成数据集与真实数据集验证了所提算法的有效性。