Explainable Multi-Agent Reinforcement Learning for Extended Reality Codec Adaptation

Extended Reality (XR) services are set to transform applications over 5th and 6th generation wireless networks, delivering immersive experiences. Concurrently, Artificial Intelligence (AI) advancements have expanded their role in wireless networks, however, trust and transparency in AI remain to be strengthened. Thus, providing explanations for AI-enabled systems can enhance trust. We introduce Value Function Factorization (VFF)-based Explainable (X) Multi-Agent Reinforcement Learning (MARL) algorithms, explaining reward design in XR codec adaptation through reward decomposition. We contribute four enhancements to XMARL algorithms. Firstly, we detail architectural modifications to enable reward decomposition in VFF-based MARL algorithms: Value Decomposition Networks (VDN), Mixture of Q-Values (QMIX), and Q-Transformation (Q-TRAN). Secondly, inspired by multi-task learning, we reduce the overhead of vanilla XMARL algorithms. Thirdly, we propose a new explainability metric, Reward Difference Fluctuation Explanation (RDFX), suitable for problems with adjustable parameters. Lastly, we propose adaptive XMARL, leveraging network gradients and reward decomposition for improved action selection. Simulation results indicate that, in XR codec adaptation, the Packet Delivery Ratio reward is the primary contributor to optimal performance compared to the initial composite reward, which included delay and Data Rate Ratio components. Modifications to VFF-based XMARL algorithms, incorporating multi-headed structures and adaptive loss functions, enable the best-performing algorithm, Multi-Headed Adaptive (MHA)-QMIX, to achieve significant average gains over the Adjust Packet Size baseline up to 10.7%, 41.4%, 33.3%, and 67.9% in XR index, jitter, delay, and Packet Loss Ratio (PLR), respectively.

翻译：扩展现实（XR）服务有望在第五代和第六代无线网络中变革应用形态，提供沉浸式体验。与此同时，人工智能（AI）的进步拓展了其在无线网络中的作用，然而AI的可信度与透明度仍有待加强。因此，为AI赋能的系统提供解释机制能够增强信任。我们提出了基于价值函数分解（VFF）的可解释（X）多智能体强化学习（MARL）算法，通过奖励分解来解释XR编解码器自适应中的奖励设计。我们对XMARL算法做出四项改进：首先，详细阐述了在基于VFF的MARL算法（包括价值分解网络（VDN）、Q值混合（QMIX）和Q变换（Q-TRAN））中实现奖励分解的架构修改；其次，受多任务学习启发，我们降低了原始XMARL算法的开销；第三，提出了一种适用于参数可调问题的新型可解释性度量指标——奖励差异波动解释（RDFX）；最后，我们提出自适应XMARL算法，利用网络梯度和奖励分解来改进动作选择。仿真结果表明，在XR编解码器自适应场景中，与包含延迟和数据速率比成分的初始复合奖励相比，分组投递率奖励是实现最优性能的主要贡献因素。通过对基于VFF的XMARL算法进行多头部结构和自适应损失函数等改进，性能最优的算法——多头部自适应（MHA）-QMIX在XR指数、抖动、延迟和分组丢失率（PLR）指标上，相较自适应分组大小基线方案分别实现了最高达10.7%、41.4%、33.3%和67.9%的平均性能提升。