Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a rigorous theoretical analysis for our framework and present various experimental results that demonstrate its applicability to real-world problems.
翻译:在线决策在众多实际应用中扮演着关键角色。在许多场景中,决策基于对输入数据点执行一系列测试而做出。然而,执行所有测试可能代价高昂,且并非总是可行。本文基于组合多臂老虎机,提出一种新的在线决策问题建模方法,并将执行测试的成本纳入考量。基于此建模,我们提出一种新的成本高效在线决策框架,该框架可利用后验采样或BayesUCB进行探索。我们对该框架进行了严格的理论分析,并呈现了多项实验结果,证明其在实际问题中的适用性。