In this study, we delve into the Thresholding Linear Bandit (TLB) problem, a nuanced domain within stochastic Multi-Armed Bandit (MAB) problems, focusing on maximizing decision accuracy against a linearly defined threshold under resource constraints. We present LinearAPT, a novel algorithm designed for the fixed budget setting of TLB, providing an efficient solution to optimize sequential decision-making. This algorithm not only offers a theoretical upper bound for estimated loss but also showcases robust performance on both synthetic and real-world datasets. Our contributions highlight the adaptability, simplicity, and computational efficiency of LinearAPT, making it a valuable addition to the toolkit for addressing complex sequential decision-making challenges.
翻译:本研究深入探讨阈值线性老虎机(Thresholding Linear Bandit, TLB)问题,这是随机多臂老虎机(Multi-Armed Bandit, MAB)问题中的一个细分领域,旨在资源约束下针对线性定义的阈值最大化决策准确率。我们提出LinearAPT,一种专为TLB固定预算场景设计的新型算法,为优化序贯决策提供高效解决方案。该算法不仅为估计损失提供理论上的上界,还在合成数据集和真实世界数据集上展现出稳健性能。我们的贡献凸显了LinearAPT的自适应性、简洁性和计算效率,使其成为应对复杂序贯决策挑战的有力工具。