In recent years, Non-Orthogonal Multiple Access (NOMA) system has emerged as a promising candidate for multiple access frameworks due to the evolution of deep machine learning, trying to incorporate deep machine learning into the NOMA system. The main motivation for such active studies is the growing need to optimize the utilization of network resources as the expansion of the internet of things (IoT) caused a scarcity of network resources. The NOMA addresses this need by power multiplexing, allowing multiple users to access the network simultaneously. Nevertheless, the NOMA system has few limitations. Several works have proposed to mitigate this, including the optimization of power allocation known as joint resource allocation(JRA) method, and integration of the JRA method and deep reinforcement learning (JRA-DRL). Despite this, the channel assignment problem remains unclear and requires further investigation. In this paper, we propose a deep reinforcement learning framework incorporating replay memory with an on-policy algorithm, allocating network resources in a NOMA system to generalize the learning. Also, we provide extensive simulations to evaluate the effects of varying the learning rate, batch size, type of model, and the number of features in the state.
翻译:近年来,随着深度机器学习的发展,非正交多址接入系统因其尝试将深度机器学习融入其中,已成为多址接入框架中极具前景的候选方案。此类研究活跃的主要动因在于,随着物联网的扩展导致网络资源日益稀缺,优化网络资源利用的需求不断增长。NOMA通过功率复用技术满足这一需求,允许多个用户同时接入网络。然而,NOMA系统仍存在一些局限性。已有若干研究工作提出了缓解方案,包括被称为联合资源分配方法的功率分配优化,以及JRA方法与深度强化学习的结合。尽管如此,信道分配问题仍未明确,需要进一步研究。本文提出了一种结合回放记忆与同策略算法的深度强化学习框架,用于在NOMA系统中分配网络资源以实现泛化学习。同时,我们通过大量仿真评估了学习率、批处理大小、模型类型以及状态特征数量变化的影响。