ASL360: AI-Enabled Adaptive Streaming of Layered 360$^\circ$ Video over UAV-assisted Wireless Networks

We propose ASL360, an adaptive deep reinforcement learning-based scheduler for on-demand 360$^\circ$ video streaming to mobile VR users in next generation wireless networks. We aim to maximize the overall Quality of Experience (QoE) of the users served over a UAV-assisted 5G wireless network. Our system model comprises a macro base station (MBS) and a UAV-mounted base station which both deploy mm-Wave transmission to the users. The 360$^\circ$ video is encoded into dependent layers and segmented tiles, allowing a user to schedule downloads of each layer's segments. Furthermore, each user utilizes multiple buffers to store the corresponding video layer's segments. We model the scheduling decision as a Constrained Markov Decision Process (CMDP), where the agent selects Base or Enhancement layers to maximize the QoE and use a policy gradient-based method (PPO) to find the optimal policy. Additionally, we implement a dynamic adjustment mechanism for cost components, allowing the system to adaptively balance and prioritize the video quality, buffer occupancy, and quality change based on real-time network and streaming session conditions. We demonstrate that ASL360 significantly improves the QoE, achieving approximately 2 dB higher average video quality, 80% lower average rebuffering time, and 57% lower video quality variation, relative to competitive baseline methods. Our results show the effectiveness of our layered and adaptive approach in enhancing the QoE in immersive videostreaming applications, particularly in dynamic and challenging network environments.

翻译：本文提出ASL360，一种基于自适应深度强化学习的调度器，用于下一代无线网络中面向移动VR用户的按需360°视频流传输。我们的目标是在无人机辅助的5G无线网络中最大化所服务用户的整体体验质量。系统模型包含宏基站和无人机搭载基站，两者均采用毫米波传输技术向用户提供服务。360°视频被编码为相互依赖的层次结构及分块片段，允许用户调度下载各层次的视频段。此外，每个用户利用多个缓冲区存储相应视频层次的片段。我们将调度决策建模为约束马尔可夫决策过程，其中智能体选择基础层或增强层以最大化体验质量，并采用基于策略梯度的近端策略优化方法求解最优策略。同时，我们实现了成本分量的动态调整机制，使系统能够根据实时网络和流媒体会话状态，自适应地平衡并优先处理视频质量、缓冲区占用率及质量波动。实验表明，相较于现有基准方法，ASL360显著提升了体验质量：平均视频质量提高约2 dB，平均卡顿时间降低80%，视频质量波动减少57%。研究结果验证了我们的分层自适应方法在沉浸式视频流应用（特别是在动态且具有挑战性的网络环境中）提升体验质量的有效性。