We introduce Flickering Multi-Armed Bandits (FMAB), a new MAB framework where the set of available arms (or actions) can change at each round, and the available set at any time may depend on the agent's previously selected arm. We model this constrained, evolving availability using random graph processes, where arms are nodes and the agent's movement is restricted to its local neighborhood. We analyze this problem under two random graph models: an i.i.d. Erdős--Rényi (ER) process and an Edge-Markovian process. We propose and analyze a two-phase algorithm that employs a lazy random walk for exploration to efficiently identify the optimal arm, followed by a navigation and commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds for both graph settings. We show that the exploration cost of our algorithm is near-optimal by establishing a matching information-theoretic lower bound for this problem class, highlighting the fundamental cost of exploration under local-move constraints. We complement our theoretical guarantees with numerical simulations, including a scenario of a robotic ground vehicle scouting a disaster-affected region.
翻译:本文提出闪烁多臂赌博机(FMAB)这一新型MAB框架,其中每轮可用的臂(或动作)集合会发生变化,且任意时刻的可用集合可能取决于智能体先前选择的臂。我们采用随机图过程对这一受约束的演化可用性进行建模,其中臂对应节点,智能体的移动被限制在其局部邻域内。我们在两种随机图模型下分析该问题:独立同分布的Erdős–Rényi(ER)过程与边马尔可夫过程。我们提出并分析了一种两阶段算法:该算法首先采用惰性随机游走进行探索以高效识别最优臂,随后通过导航与锁定阶段进行利用。我们为两种图设置建立了高概率与期望次线性遗憾界。通过为该问题类建立匹配的信息论下界,我们证明所提算法的探索成本接近最优,这揭示了局部移动约束下探索的根本成本。我们通过数值模拟(包括地面机器人侦察灾后区域的场景)对理论保证进行了补充验证。