Foundation models often struggle with uncertainty when faced with new situations in online decision-making, necessitating scalable and efficient exploration to resolve this uncertainty. We introduce GPT-HyperAgent, an augmentation of GPT with HyperAgent for uncertainty-aware, scalable exploration in contextual bandits, a fundamental online decision problem involving natural language input. We prove that HyperAgent achieves fast incremental uncertainty estimation with $\tilde{O}(\log T)$ per-step computational complexity over $T$ periods under the linear realizable assumption. Our analysis demonstrates that HyperAgent's regret order matches that of exact Thompson sampling in linear contextual bandits, closing a significant theoretical gap in scalable exploration. Empirical results in real-world contextual bandit tasks, such as automated content moderation with human feedback, validate the practical effectiveness of GPT-HyperAgent for safety-critical decisions. Our code is open-sourced at \url{https://github.com/szrlee/GPT-HyperAgent/}.
翻译:基础模型在在线决策中面对新情境时常难以处理不确定性,因此需要可扩展且高效的探索来解决这种不确定性。本文介绍了GPT-HyperAgent,它通过将GPT与HyperAgent结合,用于上下文赌博机(一种涉及自然语言输入的基本在线决策问题)中具有不确定性感知的可扩展探索。我们证明,在线性可实现假设下,HyperAgent在T个周期内实现了快速增量不确定性估计,每步计算复杂度为$\tilde{O}(\log T)$。我们的分析表明,HyperAgent在线性上下文赌博机中的遗憾阶与精确Thompson采样相匹配,从而弥合了可扩展探索理论中的一个重要空白。在现实世界的上下文赌博机任务(例如基于人类反馈的自动化内容审核)中的实证结果验证了GPT-HyperAgent在安全关键决策中的实际有效性。我们的代码已在\url{https://github.com/szrlee/GPT-HyperAgent/}开源。