The rise of large-scale pretrained models has made it feasible to generate predictive or synthetic features at low cost, raising the question of how to incorporate such surrogate predictions into downstream decision-making. We study this problem in the setting of online linear contextual bandits, where contexts may be complex, nonstationary, and only partially observed. In addition to bandit data, we assume access to an auxiliary dataset containing fully observed contexts--common in practice since such data are collected without adaptive interventions. We propose PULSE-UCB, an algorithm that leverages pretrained models trained on the auxiliary data to impute missing features during online decision-making. We establish regret guarantees that decompose into a standard bandit term plus an additional component reflecting pretrained model quality. In the i.i.d. context case with H\"older-smooth missing features, PULSE-UCB achieves near-optimal performance, supported by matching lower bounds. Our results quantify how uncertainty in predicted contexts affects decision quality and how much historical data is needed to improve downstream learning.
翻译:大规模预训练模型的兴起使得以低成本生成预测性或合成特征成为可能,这引发了如何将此类代理预测纳入下游决策的问题。我们在在线线性上下文赌博机框架下研究该问题,其中上下文可能具有复杂性、非平稳性且仅被部分观测。除赌博机数据外,我们假设存在包含完整观测上下文的辅助数据集——这在实践中很常见,因为此类数据是在无自适应干预的情况下收集的。我们提出PULSE-UCB算法,该算法利用在辅助数据上训练的预训练模型,在在线决策过程中对缺失特征进行插补。我们建立了遗憾保证,该保证可分解为标准赌博机项与反映预训练模型质量的附加项之和。在具有H\"older光滑缺失特征的独立同分布上下文场景中,PULSE-UCB实现了接近最优的性能,并通过匹配下界得到验证。我们的结果量化了预测上下文的不确定性如何影响决策质量,以及需要多少历史数据来改进下游学习。