The move toward artificial intelligence (AI)-native sixth-generation (6G) networks has put more emphasis on the importance of explainability and trustworthiness in network management operations, especially for mission-critical use-cases. Such desired trust transcends traditional post-hoc explainable AI (XAI) methods to using contextual explanations for guiding the learning process in an in-hoc way. This paper proposes a novel graph reinforcement learning (GRL) framework named TANGO which relies on a symbolic subsystem. It consists of a Bayesian-graph neural network (GNN) Explainer, whose outputs, in terms of edge/node importance and uncertainty, are periodically translated to a logical GRL reward function. This adjustment is accomplished through defined symbolic reasoning rules within a Reasoner. Considering a real-world testbed proof-of-concept (PoC), a gNodeB (gNB) radio resource allocation problem is formulated, which aims to minimize under- and over-provisioning of physical resource blocks (PRBs) while penalizing decisions emanating from the uncertain and less important edge-nodes relations. Our findings reveal that the proposed in-hoc explainability solution significantly expedites convergence compared to standard GRL baseline and other benchmarks in the deep reinforcement learning (DRL) domain. The experiment evaluates performance in AI, complexity, energy consumption, robustness, network, scalability, and explainability metrics. Specifically, the results show that TANGO achieves a noteworthy accuracy of 96.39% in terms of optimal PRB allocation in inference phase, outperforming the baseline by 1.22x.
翻译:人工智能原生第六代(6G)网络的发展趋势,使得网络管理操作中的可解释性与可信赖性愈发重要,特别是在任务关键型应用场景中。这种对可信度的需求超越了传统的后验可解释人工智能方法,转向利用情境化解释以前瞻性方式指导学习过程。本文提出了一种名为TANGO的新型图强化学习框架,该框架依赖于符号子系统。其核心是一个贝叶斯图神经网络解释器,该解释器输出的边/节点重要性及不确定性信息会周期性地转换为逻辑图强化学习奖励函数。这一调整通过推理器中定义的符号推理规则实现。基于真实世界测试平台的概念验证,本文构建了一个gNodeB无线资源分配问题模型,旨在最小化物理资源块的欠分配与过分配,同时对源自不确定性高且重要性低的边节点关系的决策施加惩罚。研究结果表明,与标准图强化学习基线及深度强化学习领域的其他基准方法相比,所提出的前瞻性可解释解决方案能显著加速收敛过程。实验从人工智能性能、复杂度、能耗、鲁棒性、网络指标、可扩展性及可解释性等多个维度评估了系统表现。具体而言,结果显示TANGO在推理阶段实现了96.39%的物理资源块最优分配准确率,较基线方法提升1.22倍。