Quality of Experience~(QoE)-driven adaptive bitrate (ABR) algorithms are typically optimized using QoE models that are based on the mean opinion score~(MOS), while such principles may not account for user heterogeneity on rating scales, resulting in unexpected behaviors. In this paper, we propose Jade, which leverages reinforcement learning with human feedback~(RLHF) technologies to better align the users' opinion scores. Jade's rank-based QoE model considers relative values of user ratings to interpret the subjective perception of video sessions. We implement linear-based and Deep Neural Network (DNN)-based architectures for satisfying both accuracy and generalization ability. We further propose entropy-aware reinforced mechanisms for training policies with the integration of the proposed QoE models. Experimental results demonstrate that Jade performs favorably on conventional metrics, such as quality and stall ratio, and improves QoE by 8.09%-38.13% in different network conditions, emphasizing the importance of user heterogeneity in QoE modeling and the potential of combining linear-based and DNN-based models for performance improvement.
翻译:体验质量驱动的自适应比特率算法通常基于平均意见分数进行优化,然而此类原则可能无法考虑用户在评分尺度上的异质性,从而导致意外行为。本文提出Jade系统,利用基于人类反馈的强化学习技术更好地对齐用户意见分数。Jade的基于排名的体验质量模型通过考虑用户评分的相对值来解读视频会话的主观感知。我们实现了线性架构和深度神经网络架构,以兼顾准确性与泛化能力。进一步提出熵感知强化机制,用于训练集成所提体验质量模型的策略。实验结果表明,Jade在质量指标、卡顿率等传统指标上表现优异,并在不同网络条件下将体验质量提升8.09%-38.13%,凸显了用户异质性在体验质量建模中的重要性,以及结合线性模型与深度神经网络模型实现性能提升的潜力。