The complexity of emerging sixth-generation (6G) wireless networks has sparked an upsurge in adopting artificial intelligence (AI) to underpin the challenges in network management and resource allocation under strict service level agreements (SLAs). It inaugurates the era of massive network slicing as a distributive technology where tenancy would be extended to the final consumer through pervading the digitalization of vertical immersive use-cases. Despite the promising performance of deep reinforcement learning (DRL) in network slicing, lack of transparency, interpretability, and opaque model concerns impedes users from trusting the DRL agent decisions or predictions. This problem becomes even more pronounced when there is a need to provision highly reliable and secure services. Leveraging eXplainable AI (XAI) in conjunction with an explanation-guided approach, we propose an eXplainable reinforcement learning (XRL) scheme to surmount the opaqueness of black-box DRL. The core concept behind the proposed method is the intrinsic interpretability of the reward hypothesis aiming to encourage DRL agents to learn the best actions for specific network slice states while coping with conflict-prone and complex relations of state-action pairs. To validate the proposed framework, we target a resource allocation optimization problem where multi-agent XRL strives to allocate optimal available radio resources to meet the SLA requirements of slices. Finally, we present numerical results to showcase the superiority of the adopted XRL approach over the DRL baseline. As far as we know, this is the first work that studies the feasibility of an explanation-guided DRL approach in the context of 6G networks.
翻译:第六代(6G)无线网络的复杂性引发了一股采用人工智能(AI)支撑严格服务水平协议(SLA)下网络管理与资源分配挑战的热潮。这开启了大规模网络切片作为分布式技术的时代,通过渗透垂直沉浸式用例的数字化,将租用权延伸至最终用户。尽管深度强化学习(DRL)在网络切片中展现出优异性能,但缺乏透明度、可解释性及不透明模型问题阻碍了用户对DRL智能体决策或预测的信任。当需要提供高可靠性与安全性服务时,这一问题尤为突出。借助可解释人工智能(XAI)与解释引导方法,我们提出一种可解释强化学习(XRL)方案,以克服黑箱DRL的不透明性。该方法的核心思想是对奖励假设进行内在可解释性设计,旨在鼓励DRL智能体在应对状态-动作对冲突频发且复杂的关系时,学习特定网络切片状态下的最优动作。为验证所提框架,我们针对资源分配优化问题,使多智能体XRL致力于分配最优可用无线资源以满足切片的SLA需求。最后,我们通过数值结果展示了所采用XRL方法相较于DRL基线的优越性。据我们所知,这是首次研究在6G网络背景下采用解释引导DRL方法的可行性。