Beam selection for joint transmission in cell-free massive multi-input multi-output systems faces the problem of extremely high training overhead and computational complexity. The traffic-aware quality of service additionally complicates the beam selection problem. To address this issue, we propose a traffic-aware hierarchical beam selection scheme performed in a dual timescale. In the long-timescale, the central processing unit collects wide beam responses from base stations (BSs) to predict the power profile in the narrow beam space with a convolutional neural network, based on which the cascaded multiple-BS beam space is carefully pruned. In the short-timescale, we introduce a centralized reinforcement learning (RL) algorithm to maximize the satisfaction rate of delay w.r.t. beam selection within multiple consecutive time slots. Moreover, we put forward three scalable distributed algorithms including hierarchical distributed Lyapunov optimization, fully distributed RL, and centralized training with decentralized execution of RL to achieve better scalability and better tradeoff between the performance and the execution signal overhead. Numerical results demonstrate that the proposed schemes significantly reduce both model training cost and beam training overhead and are easier to meet the user-specific delay requirement, compared to existing methods.
翻译:在无蜂窝大规模多输入多输出系统中,联合传输的波束选择面临训练开销极高与计算复杂度巨大的问题。流量感知服务质量进一步增加了波束选择问题的复杂性。为解决这一难题,我们提出一种基于双时间尺度执行的流量感知分层波束选择方案。在长时间尺度上,中央处理单元收集来自基站的宽波束响应,利用卷积神经网络预测窄波束空间的功率分布,并据此对级联多基站波束空间进行精细化剪枝。在短时间尺度上,我们引入集中式强化学习算法,以最大化多个连续时隙内波束选择相关的延迟满意度。此外,我们提出三种可扩展的分布式算法,包括分层分布式李雅普诺夫优化、全分布式强化学习以及集中训练与分散执行强化学习,以在可扩展性及性能与执行信号开销间实现更优平衡。数值结果表明,与现有方法相比,所提方案显著降低了模型训练成本与波束训练开销,且更易于满足用户特定的延迟需求。