In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.
翻译:本文研究了一类具有一般观测模型的Restless多臂赌博机问题。受资源限制、环境噪声或系统固有噪声的影响,玩家的操作需要基于可能存在误差的反馈机制。通过建立反馈/观测动态过程的通用概率模型,我们将问题建模为以任意初始先验信念为起点的可数信念状态空间上的Restless Bandit问题。采用带部分守恒律(PCL)的可达域方法分析该无穷状态问题,并探讨其可索引性与优先权指数(Whittle指数)。最后,我们提出一种近似过程,将原问题转化为可应用Niño-Mora与Bertsimas提出的有限状态AG算法求解的形式。数值实验表明,该算法具有优异的性能表现。