We study a delay-constrained grant-free random access system with a multi-antenna base station. The users randomly generate data packets with expiration deadlines, which are then transmitted from data queues on a first-in first-out basis. To deliver a packet, a user needs to succeed in both random access phase (sending a pilot without collision) and data transmission phase (achieving a required data rate with imperfect channel information) before the packet expires. We develop a distributed, cross-layer policy that allows the users to dynamically and independently choose their pilots and transmit powers to achieve a high effective sum throughput with fairness consideration. Our policy design involves three key components: 1) a proxy of the instantaneous data rate that depends only on macroscopic environment variables and transmission decisions, considering pilot collisions and imperfect channel estimation; 2) a quantitative, instantaneous measure of fairness within each communication round; and 3) a deep learning-based, multi-agent control framework with centralized training and distributed execution. The proposed framework benefits from an accurate, differentiable objective function for training, thereby achieving a higher sample efficiency compared with a conventional application of model-free, multi-agent reinforcement learning algorithms. The performance of the proposed approach is verified by simulations under highly dynamic and heterogeneous scenarios.
翻译:我们研究了一个配备多天线基站的延迟约束免授权随机接入系统。用户随机生成具有过期期限的数据包,这些数据包随后按照先进先出原则从数据队列中传输。为成功投递数据包,用户需在数据包过期前,在随机接入阶段(无碰撞地发送导频)与数据传输阶段(在非完美信道信息下达到所需数据速率)均获得成功。我们提出了一种分布式跨层策略,允许用户动态且独立地选择导频与发射功率,在兼顾公平性的前提下实现高效的总吞吐量。该策略设计包含三个关键组成部分:1)仅依赖于宏观环境变量与传输决策的瞬时数据速率代理函数,该函数综合考虑了导频碰撞与非完美信道估计;2)每轮通信内公平性的定量瞬时度量指标;3)基于深度学习的多智能体控制框架,采用集中式训练与分布式执行机制。所提框架得益于训练过程中精确且可微的目标函数,相比传统无模型多智能体强化学习算法的应用,实现了更高的样本效率。通过在高动态异构场景下的仿真实验,验证了所提方法的性能表现。