We consider a cellular network equipped with cache-enabled base-stations (BSs) leveraging an orthogonal multipoint multicast (OMPMC) streaming scheme. The network operates in a time-slotted fashion to serve content-requesting users by streaming cached files. The users being unsatisfied by the multicat streaming face a delivery outage, implying that they will remain interested in their preference at the next time-slot, which leads to a forward dynamics on the user preference. To design a latency-optimal streaming policy, the dynamics of latency is properly modeled and included in the learning procedure. We show that this dynamics surprisingly represents a backward dynamics. The combination of problem's forward and backward dynamics then develops a forward-backward Markov decision process (FB-MDP) that fully captures the network evolution across time. This FB-MDP necessitates usage of a forward-backward multi-objective reinforcement learning (FB-MORL) algorithm to optimize the expected latency as well as other performance metrics of interest including the overall outage probability and total resource consumption. Simulation results show the merit of proposed FB-MORL algorithm in finding a promising dynamic cache policy.
翻译:本文研究一种配备缓存基站的蜂窝网络,该网络采用正交多点多播流传输方案。网络以时隙化方式运行,通过流式传输缓存文件为内容请求用户提供服务。未能通过多播流得到满足的用户将面临传输中断,这意味着他们在下一时隙仍会保持对偏好内容的兴趣,从而形成用户偏好的前向动态。为设计延迟最优的流传输策略,我们精确建模了延迟动态并将其纳入学习过程。研究表明,该延迟动态意外地呈现出后向动态特性。结合问题的前向与后向动态,我们构建了能够完整描述网络时序演化的前向-后向马尔可夫决策过程。该决策过程需采用前向-后向多目标强化学习算法,以同时优化期望延迟及其他关键性能指标(包括总体中断概率和总资源消耗)。仿真结果验证了所提前向-后向多目标强化学习算法在寻找高效动态缓存策略方面的优越性。