We investigate the applicability of deep reinforcement learning algorithms to the adaptive initial access beam alignment problem for mmWave communications using the state-of-the-art proximal policy optimization algorithm as an example. In comparison to recent unsupervised learning based approaches developed to tackle this problem, deep reinforcement learning has the potential to address a new and wider range of applications, since, in principle, no (differentiable) model of the channel and/or the whole system is required for training, and only agent-environment interactions are necessary to learn an algorithm (be it online or using a recorded dataset). We show that, although the chosen off-the-shelf deep reinforcement learning agent fails to perform well when trained on realistic problem sizes, introducing action space shaping in the form of beamforming modules vastly improves the performance, without sacrificing much generalizability. Using this add-on, the agent is able to deliver competitive performance to various state-of-the-art methods on simulated environments, even under realistic problem sizes. This demonstrates that through well-directed modification, deep reinforcement learning may have a chance to compete with other approaches in this area, opening up many straightforward extensions to other/similar scenarios.
翻译:我们以最新的近端策略优化算法为例,研究了深度强化学习算法在毫米波通信自适应初始接入波束对准问题中的适用性。与近期为解决该问题提出的无监督学习方法相比,深度强化学习具有应对更广泛新应用场景的潜力,因为原则上该方法无需建立(可微分的)信道和/或整个系统模型,仅需智能体与环境交互(无论是在线学习还是使用记录数据集)即可学习算法。研究表明,尽管选择的现成深度强化学习智能体在真实问题规模下训练表现不佳,但通过引入波束赋形模块的形式对动作空间进行塑造,可以在不显著损失泛化能力的情况下大幅提升性能。借助这一附加模块,该智能体在模拟环境中能够与多种最新方法展开竞争,即使在真实问题规模下也是如此。这表明,通过具有针对性的改进,深度强化学习有望在该领域与其他方法一较高下,并可直接扩展到其他/类似场景中。