Large language models (LLMs) have become powerful and widely used systems for language understanding and generation, while multi-armed bandit (MAB) algorithms provide a principled framework for adaptive decision-making under uncertainty. This survey explores the potential at the intersection of these two fields. As we know, it is the first survey to systematically review the bidirectional interaction between large language models and multi-armed bandits at the component level. We highlight the bidirectional benefits: MAB algorithms address critical LLM challenges, spanning from pre-training to retrieval-augmented generation (RAG) and personalization. Conversely, LLMs enhance MAB systems by redefining core components such as arm definition and environment modeling, thereby improving decision-making in sequential tasks. We analyze existing LLM-enhanced bandit systems and bandit-enhanced LLM systems, providing insights into their design, methodologies, and performance. Key challenges and representative findings are identified to help guide future research. An accompanying GitHub repository that indexes relevant literature is available at https://github.com/bucky1119/Awesome-LLM-Bandit-Interaction.
翻译:大型语言模型已成为语言理解与生成领域强大且广泛应用的系统,而多臂赌博机算法为不确定性条件下的自适应决策提供了理论框架。本综述探讨了这两个领域交叉处的潜在价值。据我们所知,这是首个在组件层面系统综述大型语言模型与多臂赌博机双向交互的研究。我们强调双向增益:MAB算法解决了从预训练到检索增强生成及个性化等LLM的关键挑战;反之,LLM通过重新定义臂定义和环境建模等核心组件来增强MAB系统,从而提升序列任务中的决策能力。我们分析了现有LLM增强的赌博机系统与赌博机增强的LLM系统,深入剖析其设计原理、方法论与性能表现。通过识别关键挑战与代表性发现,为未来研究提供指引。附带的GitHub文献索引库详见https://github.com/bucky1119/Awesome-LLM-Bandit-Interaction。