Assume that an interferer behaves according to a parametric model but one does not know the value of the model parameters. Sensing enables to improve the model knowledge and therefore perform a better link adaptation. However, we consider a half-duplex scenario where, at each time slot, the communication system should decide between sensing and communication. We thus propose to investigate the optimal policy to maximize the expected sum rate given a finite-time communication. % the following question therefore arises: At a given time slot, should one sense or communicate? We first show that this problem can be modelled in the Markov decision process (MDP) framework. We then demonstrate that the optimal open-loop and closed-loop policies can be found significantly faster than the standard backward-induction algorithm.
翻译:假设干扰源遵循某种参数模型,但模型参数的具体取值未知。感知能够提升对模型的认识,从而实现更优的链路自适应。然而,我们考虑一个半双工场景,其中通信系统在每个时隙需在感知与通信之间做出决策。因此,我们提出研究在有限通信时间内最大化期望总速率的最优策略。我们首先证明该问题可在马尔可夫决策过程(MDP)框架下建模。随后,我们证明最优开环与闭环策略的求解速度可显著快于标准的逆向归纳算法。