Hybrid Feedback-Guided Optimal Learning for Wireless Interactive Panoramic Scene Delivery

Immersive applications such as virtual and augmented reality impose stringent requirements on frame rate, latency, and synchronization between physical and virtual environments. To meet these requirements, an edge server must render panoramic content, predict user head motion, and transmit a portion of the scene that is large enough to cover the user viewport while remaining within wireless bandwidth constraints. Each portion produces two feedback signals: prediction feedback, indicating whether the selected portion covers the actual viewport, and transmission feedback, indicating whether the corresponding packets are successfully delivered. Prior work models this problem as a multi-armed bandit with two-level bandit feedback, but fails to exploit the fact that prediction feedback can be retrospectively computed for all candidate portions once the user head pose is observed. As a result, prediction feedback constitutes full-information feedback rather than bandit feedback. Motivated by this observation, we introduce a two-level hybrid feedback model that combines full-information and bandit feedback, and formulate the portion selection problem as an online learning task under this setting. We derive an instance-dependent regret lower bound for the hybrid feedback model and propose AdaPort, a hybrid learning algorithm that leverages both feedback types to improve learning efficiency. We further establish an instance-dependent regret upper bound that matches the lower bound asymptotically, and demonstrate through real-world trace driven simulations that AdaPort consistently outperforms state-of-the-art baseline methods.

翻译：虚拟现实和增强现实等沉浸式应用对帧率、延迟以及物理与虚拟环境间的同步性提出了严格要求。为满足这些要求，边缘服务器需渲染全景内容、预测用户头部运动，并在无线带宽约束下，传输足以覆盖用户视口的场景部分。每个场景部分会产生两种反馈信号：预测反馈（指示所选部分是否覆盖实际视口）和传输反馈（指示对应数据包是否成功送达）。先前研究将该问题建模为具有双层老虎机反馈的多臂老虎机问题，但未能利用以下事实：一旦观察到用户头部姿态，即可对所有候选部分回溯计算预测反馈。因此，预测反馈构成的是全信息反馈而非老虎机反馈。基于此观察，我们提出一种融合全信息反馈与老虎机反馈的双层混合反馈模型，并在此设定下将场景部分选择问题形式化为在线学习任务。我们推导了混合反馈模型的实例相关遗憾下界，并提出AdaPort混合学习算法——该算法通过协同利用两种反馈类型提升学习效率。我们进一步建立了与下界渐近匹配的实例相关遗憾上界，并通过真实轨迹驱动的仿真实验证明，AdaPort在性能上持续优于当前最先进的基线方法。