In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabilities, we provide communication schemes along with their analysis, which interestingly relate to the zero-error capacity of the underlying DMC.
翻译:本文考虑一个多臂赌博机(MAB)实例,并研究当臂指令通过离散无记忆信道(DMC)从中央学习器传输到分布式代理时,如何识别最佳臂。根据代理的能力,我们提供了通信方案及其分析,这些方案与分析有趣地关联到底层DMC的零误差容量。