We consider a scenario where multiple users, powered by energy harvesting, send version updates over a fading multiple access channel (MAC) to an access point (AP). Version updates having random importance weights arrive at a user according to an exogenous arrival process, and a new version renders all previous versions obsolete. As energy harvesting imposes a time-varying peak power constraint, it is not possible to deliver all the bits of a version instantaneously. Accordingly, the AP chooses the objective of minimizing a finite-horizon time average expectation of the product of importance weight and a convex increasing function of the number of remaining bits of a version to be transmitted at each time instant. The objective enables importance-aware delivery of as many bits, as soon as possible. In this setup, the AP optimizes the objective function subject to an achievable rate-region constraint of the MAC and energy constraints at the users, by deciding the transmit power and the number of bits to be transmitted by each user. We obtain a Markov Decision Process (MDP)-based optimal online policy to the problem and derive structural properties of the policy. We then develop a neural network (NN)-based online heuristic policy, for which we train an NN on the optimal offline policy derived for different sample paths of energy, version arrival and channel power gain processes. Via numerical simulations, we observe that the NN-based online policy performs competitively with respect to the MDP-based online policy.
翻译:考虑一个由能量采集供能的多用户通过衰落多址接入信道向接入点发送版本更新的场景。版本更新携带随机重要性权重,按照外生到达过程抵达用户,且新版本会使所有旧版本失效。由于能量采集施加了时变峰值功率约束,无法瞬时完成所有版本比特的传输。据此,接入点设定的目标是最小化有限时域内重要性权重与版本剩余比特数凸增函数乘积的时间平均期望。该目标能实现尽可能多的比特在尽可能短的时间内进行重要性感知传输。在此框架下,接入点通过决定每个用户的发射功率和待传输比特数,在满足多址接入信道可达速率区域约束和用户能量约束的前提下优化目标函数。我们推导出基于马尔可夫决策过程的最优在线策略,并揭示该策略的结构特性。随后提出基于神经网络的在线启发式策略,通过在不同能量、版本到达和信道功率增益样本路径上训练最优离线策略对应的神经网络。数值仿真表明,基于神经网络的在线策略性能与基于马尔可夫决策过程的在线策略相当。