We introduce a practical, black-box framework termed Detection Augmented Learning (DAL) for the problem of piecewise stationary bandits without knowledge of the underlying non-stationarity. DAL accepts any stationary bandit algorithm with order-optimal regret as input and augments it with a change detector, enabling applicability to all common bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses all state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL's strong empirical performance, complemented by thorough empirical validation.
翻译:摘要:针对未知底层非平稳性的分段平稳赌博机问题,我们提出了一种名为检测增强学习(Detection Augmented Learning,DAL)的实用黑盒框架。该框架可接收任意具有阶最优遗憾的平稳赌博机算法作为输入,并通过添加变化检测器进行增强,从而适用于所有常见赌博机变体。广泛实验表明,在包括合成基准测试和真实世界数据集在内的多样化非平稳场景中,DAL始终优于所有现有最优方法,凸显了其通用性与可扩展性。我们为DAL优异的实证表现提供了理论洞见,并辅以全面的实验验证。