We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collision), having no prior knowledge of the network size or of the transmission strategy of other sources. The goal of each source is to maximize their own throughput while striving for network-wide fairness. We propose a novel fully decentralized Reinforcement Learning (RL)-based solution that achieves fairness without coordination. The proposed Fair Share RL (FSRL) solution combines: (i) state augmentation with a semi-adaptive time reference; (ii) an architecture that leverages risk control and time difference likelihood; and (iii) a fairness-driven reward structure. We evaluate FSRL in more than 50 network settings with different number of agents, different amounts of available spectrum, in the presence of jammers, and in an ad-hoc setting. Simulation results suggest that, when we compare FSRL with a common baseline RL algorithm from the literature, FSRL can be up to 89.0% fairer (as measured by Jain's fairness index) in stringent settings with several sources and a single frequency band, and 48.1% fairer on average.
翻译:我们研究一个由多个源-目的地对共享有限数量正交频段的去中心化无线网络。各源节点以去中心化方式随时间学习调整其传输策略(具体指频段选择策略),且彼此间不共享信息。源节点仅能观测自身传输结果(即成功或碰撞),对网络规模及其他源节点的传输策略均无先验知识。每个源节点的目标是在追求全网公平性的同时最大化自身吞吐量。我们提出一种新颖的完全基于去中心化强化学习的解决方案,无需协调即可实现公平性。所提出的公平共享强化学习方案融合了以下要素:(i)采用半自适应时间参考的状态增强机制;(ii)结合风险控制与时间差分似然的架构设计;(iii)以公平性为导向的奖励结构。我们在超过50种网络场景中对FSRL进行评估,涵盖不同智能体数量、不同可用频谱量、存在干扰源以及自组织网络等情形。仿真结果表明:在多个源节点竞争单一频段的严苛场景中,相较于文献中常见的基线强化学习算法,FSRL的公平性(以Jain公平性指数衡量)最高可提升89.0%;在平均情况下,公平性提升幅度达48.1%。