We study Contextual Multi-Armed Bandits (CMABs) for non-episodic sequential decision making problems where the context includes both textual and numerical information (e.g., recommendation systems, dynamic portfolio adjustments, offer selection; all frequent problems in finance). While Large Language Models (LLMs) are increasingly applied to these settings, utilizing LLMs for reasoning at every decision step is computationally expensive and uncertainty estimates are difficult to obtain. To address this, we introduce LLMP-UCB, a bandit algorithm that derives uncertainty estimates from LLMs via repeated inference. However, our experiments demonstrate that lightweight numerical bandits operating on text embeddings (dense or Matryoshka) match or exceed the accuracy of LLM-based solutions at a fraction of their cost. We further show that embedding dimensionality is a practical lever on the exploration-exploitation balance, enabling cost--performance tradeoffs without prompt complexity. Finally, to guide practitioners, we propose a geometric diagnostic based on the arms' embedding to decide when to use LLM-driven reasoning versus a lightweight numerical bandit. Our results provide a principled deployment framework for cost-effective, uncertainty-aware decision systems with broad applicability across AI use cases in financial services.
翻译:我们研究上下文多臂赌博机(CMABs)在非回合制序贯决策问题中的应用,其中上下文包含文本和数值信息(例如金融领域常见的推荐系统、动态投资组合调整、方案选择等问题)。尽管大型语言模型(LLMs)越来越多地被应用于这些场景,但在每个决策步骤依赖LLMs进行推理会产生高额计算成本,且难以获得不确定性估计。为此,我们提出LLMP-UCB算法——一种通过重复推理从LLMs中导出不确定性估计的赌博机算法。然而实验表明,基于文本嵌入(稠密或Matryoshka嵌入)的轻量级数值型赌博机能够以极低的成本达到甚至超越基于LLM的解决方案的精度。我们进一步证明:嵌入维度可作为探索-利用权衡中的实用杠杆,在不增加提示复杂度的前提下实现成本-性能折中。最后,为指导实际应用,我们提出基于臂嵌入的几何诊断准则,用于判断何时应采用LLM驱动的推理,何时选择轻量级数值型赌博机。研究结果为金融服务领域AI用例中经济高效、具备不确定性感知能力的决策系统提供了原则性部署框架。