Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability. We analyse four decision-making policies-Explore-then-Commit, Action Elimination, Lower Confidence Bound (LCB), and Thompson Sampling-and provide sharp theoretical regret guarantees for each. Unlike in classical bandit settings, Explore-then-Commit and Action Elimination incur suboptimal regret because they commit to a fixed ordering after the exploration phase, limiting their ability to adapt. In contrast, LCB and Thompson Sampling continuously update their decisions based on observed feedback, achieving constant O(1) regret. Simulations corroborate these theoretical findings, highlighting the crucial role of adaptivity for efficient edge inference under uncertainty.
翻译:受边缘推理挑战的驱动,我们研究了一种级联臂机模型的变体,其中每个臂对应一个具有相关准确率和错误概率的推理模型。我们分析了四种决策策略——先探索后承诺、动作消除、下置信界(LCB)和汤普森采样——并为每种策略提供了严格的理论遗憾界。与经典臂机设置不同,先探索后承诺和动作消除会产生次优遗憾,因为它们在探索阶段后承诺固定顺序,限制了适应能力。相比之下,LCB和汤普森采样基于观测反馈持续更新决策,实现了常数O(1)遗憾。仿真结果验证了这些理论发现,突显了自适应在不确定性下实现高效边缘推理中的关键作用。