Long propagation delays in underwater acoustic networks (UWANs) cause spatio-temporal uncertainty, constraining channel utilization in medium access control (MAC) protocols. Node mobility within autonomous underwater vehicle scenarios exacerbates these challenges by introducing dynamic propagation delays and varying spatial topologies. We present MobiU-MAC, a deep reinforcement learning (DRL)-based MAC protocol for mobile node access in UWANs that maximizes throughput via autonomous learning. MobiU-MAC incorporates CHILL-STER, a novel DRL algorithm optimized for UWANs that is both ranging-free and delay-robust. CHILL-STER employs a credit horizon-limited $λ$-return (CHILL-Return) mechanism to achieve stable learning under asynchronous delayed rewards, while the companion spatio-temporal experience replay (STER) mechanism addresses topological changes arising from node mobility. This work also demonstrates theoretically that DRL attains optimal policy learning equivalent to a standard Markov decision process under long propagation delays without requiring ranging. Performance evaluations indicate that MobiU-MAC outperforms existing DRL-based MAC protocols for UWANs by leveraging the maximum system delay boundary without ranging overhead, supporting the effectiveness of the proposed theory and algorithm in complex underwater dynamic environments.
翻译:水下声学网络(UWANs)中的长传播延迟会导致时空不确定性,从而制约介质访问控制(MAC)协议的信道利用率。在自主水下航行器场景中,节点移动性会引入动态传播延迟和变化的空间拓扑,进一步加剧这些挑战。本文提出MobiU-MAC——一种基于深度强化学习(DRL)的水下移动节点接入MAC协议,通过自主学习实现吞吐量最大化。MobiU-MAC创新性地引入了CHILL-STER,一种专为UWANs优化的无距离测度且延迟鲁棒的DRL算法。CHILL-STER采用信用时限限制的λ-回报(CHILL-Return)机制,可在异步延迟奖励下实现稳定学习,同时辅以时空经验回放(STER)机制应对节点移动引起的拓扑变化。本文还从理论上证明,在无需距离测度的条件下,DRL可在长传播延迟环境中实现与标准马尔可夫决策过程等价的最优策略学习。性能评估表明,MobiU-MAC通过利用系统最大延迟边界且无需测距开销,在复杂水下动态环境中优于现有基于DRL的UWANs MAC协议,有效验证了所提理论与算法的有效性。