Estimating WebRTC Video QoE Metrics Without Using Application Headers

The increased use of video conferencing applications (VCAs) has made it critical to understand and support end-user quality of experience (QoE) by all stakeholders in the VCA ecosystem, especially network operators, who typically do not have direct access to client software. Existing VCA QoE estimation methods use passive measurements of application-level Real-time Transport Protocol (RTP) headers. However, a network operator does not always have access to RTP headers in all cases, particularly when VCAs use custom RTP protocols (e.g., Zoom) or due to system constraints (e.g., legacy measurement systems). Given this challenge, this paper considers the use of more standard features in the network traffic, namely, IP and UDP headers, to provide per-second estimates of key VCA QoE metrics such as frames rate and video resolution. We develop a method that uses machine learning with a combination of flow statistics (e.g., throughput) and features derived based on the mechanisms used by the VCAs to fragment video frames into packets. We evaluate our method for three prevalent VCAs running over WebRTC: Google Meet, Microsoft Teams, and Cisco Webex. Our evaluation consists of 54,696 seconds of VCA data collected from both (1), controlled in-lab network conditions, and (2) real-world networks from 15 households. We show that the ML-based approach yields similar accuracy compared to the RTP-based methods, despite using only IP/UDP data. For instance, we can estimate FPS within 2 FPS for up to 83.05% of one-second intervals in the real-world data, which is only 1.76% lower than using the application-level RTP headers.

翻译：视频会议应用（VCA）的广泛使用使得VCA生态系统中的各利益相关方（尤其是通常无法直接访问客户端软件的网络运营商）必须理解并支持终端用户体验质量（QoE）的评估。现有VCA QoE估计方法依赖于对应用层实时传输协议（RTP）头部的被动测量。然而，网络运营商并非总能获取RTP头部信息——例如当VCA采用自定义RTP协议（如Zoom）时，或因系统限制（如传统测量系统）所致。针对这一挑战，本文探索利用网络流量中更通用的特征（即IP和UDP头部）来实现VCA关键QoE指标（如帧率和视频分辨率）的每秒级估计。我们提出一种结合流量统计量（如吞吐量）与基于VCA将视频帧分片进数据包机制导出的特征，并采用机器学习的方法。针对运行于WebRTC上的三种主流VCA（Google Meet、Microsoft Teams和Cisco Webex）进行方法评估。评估数据集包含54,696秒的VCA数据，分别采集自：（1）受控实验室网络条件；（2）15个家庭的实际网络环境。结果表明，尽管仅使用IP/UDP数据，基于机器学习的方法仍能达到与基于RTP的方法相当的精度。例如，在实际网络数据中，该方法可在83.05%的单秒间隔内将帧率估计误差控制在2 FPS以内，其性能仅比使用应用层RTP头部低1.76%。