Underwater Acoustic (UWA) networks are vital for remote sensing and ocean exploration but face inherent challenges such as limited bandwidth, long propagation delays, and highly dynamic channels. These constraints hinder real-time communication and degrade overall system performance. To address these challenges, this paper proposes a bilevel Multi-Armed Bandit (MAB) framework. At the fast inner level, a Contextual Delayed MAB (CD-MAB) jointly optimizes adaptive modulation and transmission power based on both channel state feedback and its Age of Information (AoI), thereby maximizing throughput. At the slower outer level, a Feedback Scheduling MAB dynamically adjusts the channel-state feedback interval according to throughput dynamics: stable throughput allows longer update intervals, while throughput drops trigger more frequent updates. This adaptive mechanism reduces feedback overhead and enhances responsiveness to varying network conditions. The proposed bilevel framework is computationally efficient and well-suited to resource-constrained UWA networks. Simulation results using the DESERT Underwater Network Simulator demonstrate throughput gains of up to 20.61% and energy savings of up to 36.60% compared with Deep Reinforcement Learning (DRL) baselines reported in the existing literature.
翻译:水声网络对于遥感和海洋探测至关重要,但其面临着固有挑战,如有限带宽、长传播时延和高度动态的信道。这些限制阻碍了实时通信并降低了整体系统性能。为解决这些挑战,本文提出了一种双层多臂老虎机框架。在快速内层,一个上下文延迟多臂老虎机基于信道状态反馈及其信息年龄,联合优化自适应调制和发射功率,从而最大化吞吐量。在较慢的外层,一个反馈调度多臂老虎机根据吞吐量动态调整信道状态反馈间隔:稳定的吞吐量允许更长的更新间隔,而吞吐量下降则触发更频繁的更新。这种自适应机制减少了反馈开销,并增强了对变化网络条件的响应能力。所提出的双层框架计算高效,非常适合资源受限的水声网络。使用DESERT水下网络仿真器的仿真结果表明,与现有文献中报道的深度强化学习基线相比,吞吐量增益最高可达20.61%,节能最高可达36.60%。