The popularity of immersive videos has prompted extensive research into neural adaptive tile-based streaming to optimize video transmission over networks with limited bandwidth. However, the diversity of users' viewing patterns and Quality of Experience (QoE) preferences has not been fully addressed yet by existing neural adaptive approaches for viewport prediction and bitrate selection. Their performance can significantly deteriorate when users' actual viewing patterns and QoE preferences differ considerably from those observed during the training phase, resulting in poor generalization. In this paper, we propose MANSY, a novel streaming system that embraces user diversity to improve generalization. Specifically, to accommodate users' diverse viewing patterns, we design a Transformer-based viewport prediction model with an efficient multi-viewport trajectory input output architecture based on implicit ensemble learning. Besides, we for the first time combine the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences. Extensive experiments demonstrate that MANSY outperforms state-of-the-art approaches in viewport prediction accuracy and QoE improvement on both trained and unseen viewing patterns and QoE preferences, achieving better generalization.
翻译:沉浸式视频的普及推动了基于神经自适应分片的流媒体传输研究,旨在通过有限带宽网络优化视频传输。然而,现有神经自适应方法在视口预测与码率选择中未充分解决用户观看模式与体验质量偏好的多样性问题。当用户实际观看模式及体验质量偏好与训练阶段存在显著差异时,此类方法的性能会大幅下降,泛化能力不足。本文提出MANSY这一新型流媒体系统,通过融合用户多样性提升泛化能力。具体而言,为适配用户多样的观看模式,我们基于隐式集成学习设计了Transformer视口预测模型,采用高效的多视口轨迹输入输出架构;同时,首次将先进表示学习与深度强化学习相结合,训练码率选择模型以最大化多样化的体验质量目标,使模型能泛化至偏好各异的用户群体。大量实验表明,MANSY在已训练及未见过的观看模式与体验质量偏好下,在视口预测精度与体验质量提升方面均优于现有最优方法,展现出更强的泛化能力。