Quality of Experience~(QoE)-driven adaptive bitrate~(ABR) algorithms are typically optimized using QoE models that are based on the mean opinion score~(MOS), while such principles may not account for user heterogeneity on rating scales, resulting in unexpected behaviors. In this paper, we propose \texttt{Jade}, which leverages reinforcement learning with human feedback~(RLHF) technologies to better align the users' opinion scores. \texttt{Jade}'s rank-based QoE model considers relative values of user ratings to interpret the subjective perception of video sessions. We implement linear-based and Deep Neural Network (DNN)-based architectures for satisfying both accuracy and generalization ability. We further propose entropy-aware reinforced mechanisms for training policies with the integration of the proposed QoE models. Experimental results demonstrate that \texttt{Jade} performs favorably on conventional metrics, such as quality and stall ratio, and improves QoE by 8.09\%-38.13\% in different network conditions, emphasizing the importance of user heterogeneity in QoE modeling and the potential of combining linear-based and DNN-based models for performance improvement.
翻译:体验质量(QoE)驱动的自适应码率(ABR)算法通常采用基于平均意见得分(MOS)的QoE模型进行优化,然而此类准则可能未考虑用户在评分尺度上的异质性,导致意外行为。本文提出\texttt{Jade}方法,利用基于人类反馈的强化学习(RLHF)技术更好地对齐用户意见得分。\texttt{Jade}的基于排名的QoE模型通过考虑用户评分的相对值来解读视频会话的主观感知。我们实现了基于线性模型和深度神经网络(DNN)的架构,以兼顾准确性和泛化能力。进一步提出熵感知强化机制,用于整合所提QoE模型训练策略。实验结果表明,\texttt{Jade}在质量、卡顿比等传统指标上表现优异,并在不同网络条件下将QoE提升8.09%至38.13%,凸显了用户异质性在QoE建模中的重要性,以及结合线性模型与DNN模型提升性能的潜力。