Assume that an interferer behaves according to a parametric model but one does not know the value of the model parameters. Sensing enables to improve the model knowledge and therefore perform a better link adaptation. However, we consider a half-duplex scenario where, at each time slot, the communication system should decide between sensing and communication. We thus propose to investigate the optimal policy to maximize the expected sum rate given a finite-time communication. % the following question therefore arises: At a given time slot, should one sense or communicate? We first show that this problem can be modelled in the Markov decision process (MDP) framework. We then demonstrate that the optimal open-loop and closed-loop policies can be found significantly faster than the standard backward-induction algorithm.
翻译:假设干扰源遵循参数化模型,但模型参数未知。感知能够提升模型认知,从而实现更优的链路自适应。然而,我们考虑半双工场景,即通信系统在每个时隙需在感知与通信之间做出决策。因此,我们提出研究在有限通信时间内最大化期望总速率的最优策略。我们首先证明该问题可在马尔可夫决策过程框架下建模,进而证明相较于标准的逆向归纳算法,能够显著更快地求解最优开环与闭环策略。