截断仿射策略：能量收集通信在衰落信道上低复杂度近优在线功率控制 (Clipped Affine Policy: Low-Complexity Near-Optimal Online Power Control for Energy Harvesting Communications over Fading Channels)

This paper investigates online power control for point-to-point energy harvesting communications over wireless fading channels. A linear-policy-based approximation is derived for the relative-value function in the Bellman equation of the power control problem. This approximation leads to two fundamental power control policies: optimistic and robust clipped affine policies, both taking the form of a clipped affine function of the battery level and the reciprocal of channel signal-to-noise ratio coefficient. They are essentially battery-limited weighted directional waterfilling policies operating between adjacent time slots. By leveraging the relative-value approximation and derived policies, a domain-knowledge-enhanced reinforcement learning (RL) algorithm is proposed for online power control. The proposed approach is further extended to scenarios with energy and/or channel lookahead. Comprehensive simulation results demonstrate that the proposed methods achieve a good balance between computational complexity and optimality. In particular, the robust clipped affine policy (combined with RL, using at most five parameters) outperforms all existing approaches across various scenarios, with less than 2\% performance loss relative to the optimal policy.

翻译：本文研究无线衰落信道下点对点能量收集通信的在线功率控制问题。针对功率控制问题贝尔曼方程中的相对值函数，推导了一种基于线性策略的近似方法。该近似引出了两种基本功率控制策略：乐观型和鲁棒型截断仿射策略，二者均采用电池电量与信道信噪比系数倒数之截断仿射函数的形式。它们本质上是相邻时隙间运行的电池受限加权定向注水策略。通过利用相对值近似与所推导的策略，提出了一种面向在线功率控制的领域知识增强强化学习算法。所提方法进一步扩展至具备能量和/或信道前瞻信息的场景。综合仿真结果表明，所提方法在计算复杂度与最优性之间取得了良好平衡。特别地，鲁棒截断仿射策略（结合强化学习，最多使用五个参数）在多种场景下均优于现有所有方法，其性能损失相对于最优策略小于2%。