The rapid growth of multimedia consumption, driven by major advances in mobile devices since the mid-2000s, has led to widespread use of video conferencing applications (VCAs) such as Zoom and Google Meet, as well as instant messaging applications (IMAs) like WhatsApp and Telegram, which increasingly support video conferencing as a core feature. Many of these systems rely on the Web Real-Time Communication (WebRTC) protocol, enabling direct peer-to-peer media streaming without requiring a third-party server to relay data, reducing the latency and facilitating a real-time communication. Despite WebRTC's potential, adverse network conditions can degrade streaming quality and consequently reduce users' Quality of Experience (QoE). Maintaining high QoE therefore requires continuous monitoring and timely intervention when QoE begins to deteriorate. While content providers can often estimate QoE by directly comparing transmitted and received media, this task is significantly more challenging for internet service providers (ISPs). End-to-end encryption, commonly used by modern VCAs and IMAs, prevent ISPs from accessing the original media stream, leaving only Quality of Service (QoS) and routing information available. To address this limitation, we propose the QoE Attention Convolutional Neural Network (qAttCNN), a model that leverages packet size parameter of the traffic to infer two no-reference QoE metrics viz. BRISQUE and frames per second (FPS). We evaluate qAttCNN on a custom dataset collected from WhatsApp video calls and compare it against existing QoE models. Using mean absolute error percentage (MAEP), our approach achieves 2.14% error for BRISQUE and 7.39% for FPS prediction.
翻译:自21世纪中期以来,移动设备的重大进步推动了多媒体消费的快速增长,导致视频会议应用(如Zoom、Google Meet)以及即时通讯应用(如WhatsApp、Telegram)的广泛使用,这些应用日益将视频会议作为核心功能。许多此类系统依赖于Web实时通信(WebRTC)协议,该协议支持直接的点对点媒体流传输,无需第三方服务器中继数据,从而降低了延迟并促进了实时通信。尽管WebRTC具有潜力,但不利的网络条件可能会降低流媒体质量,进而影响用户的体验质量。因此,保持高QoE需要持续监控,并在QoE开始恶化时及时干预。虽然内容提供商通常可以通过直接比较发送和接收的媒体来估计QoE,但对于互联网服务提供商而言,这项任务更具挑战性。现代VCAs和IMAs普遍采用的端到端加密技术阻止了ISP访问原始媒体流,仅留下服务质量(QoS)和路由信息可供使用。为应对这一限制,我们提出了QoE注意力卷积神经网络(qAttCNN),该模型利用流量的数据包大小参数来推断两种无参考QoE指标,即BRISQUE和每秒帧数(FPS)。我们在从WhatsApp视频通话收集的自定义数据集上评估了qAttCNN,并将其与现有的QoE模型进行了比较。使用平均绝对误差百分比(MAEP)作为评估指标,我们的方法在BRISQUE预测上实现了2.14%的误差,在FPS预测上实现了7.39%的误差。