We study the multichannel quickest change detection problem with bandit feedback and controlled sensing, in which an agent sequentially selects one of the data streams to observe at each time-step and aims to detect an unknown change as quickly as possible while controlling false alarms. Assuming known pre- and post-change distributions and allowing an arbitrary subset of streams to be affected by the change, we propose two novel and computationally efficient detection procedures inspired by the Upper Confidence Bound (UCB) multi-armed bandit algorithm. Our methods adaptively concentrate sensing on the most informative streams while preserving false-alarm guarantees. We show that both procedures achieve first-order asymptotic optimality in detection delay under standard false-alarm constraints. We also extend the UCB-driven controlled sensing approach to the setting where the pre- and post-change distributions are unknown, except for a mean-shift in at least one of the channels at the change-point. This setting is particularly relevant to the problem of learning in piecewise stationary environments. Finally, extensive simulations on synthetic benchmarks show that our methods consistently outperform existing state-of-the-art approaches while offering substantial computational savings.
翻译:我们研究具有赌博机反馈和受控感知的多通道最快变化检测问题,其中智能体在每个时间步依次选择一个数据流进行观测,旨在尽可能快地检测未知变化,同时控制虚警。在假设已知变化前和变化后分布,并允许任意子集的数据流受变化影响的情况下,我们受上置信界(UCB)多臂赌博机算法启发,提出了两种新颖且计算高效的检测程序。我们的方法在保持虚警保证的同时,自适应地将感知集中在最具信息量的数据流上。我们证明,在标准虚警约束下,这两种程序在检测延迟方面均达到一阶渐近最优性。我们还将基于UCB的受控感知方法扩展到变化前和变化后分布未知的场景(除变化点至少一个通道存在均值漂移外)。该设置特别适用于分段平稳环境中的学习问题。最后,在合成基准上的大量模拟表明,我们的方法始终优于现有最先进方法,同时显著降低计算成本。