The increased use of video conferencing applications (VCAs) has made it critical to understand and support end-user quality of experience (QoE) by all stakeholders in the VCA ecosystem, especially network operators, who typically do not have direct access to client software. Existing VCA QoE estimation methods use passive measurements of application-level Real-time Transport Protocol (RTP) headers. However, a network operator does not always have access to RTP headers in all cases, particularly when VCAs use custom RTP protocols (e.g., Zoom) or due to system constraints (e.g., legacy measurement systems). Given this challenge, this paper considers the use of more standard features in the network traffic, namely, IP and UDP headers, to provide per-second estimates of key VCA QoE metrics such as frames rate and video resolution. We develop a method that uses machine learning with a combination of flow statistics (e.g., throughput) and features derived based on the mechanisms used by the VCAs to fragment video frames into packets. We evaluate our method for three prevalent VCAs running over WebRTC: Google Meet, Microsoft Teams, and Cisco Webex. Our evaluation consists of 54,696 seconds of VCA data collected from both (1), controlled in-lab network conditions, and (2) real-world networks from 15 households. We show that the ML-based approach yields similar accuracy compared to the RTP-based methods, despite using only IP/UDP data. For instance, we can estimate FPS within 2 FPS for up to 83.05% of one-second intervals in the real-world data, which is only 1.76% lower than using the application-level RTP headers.
翻译:视频会议应用(VCA)的广泛使用使得VCA生态系统中的各方利益相关者(尤其是通常无法直接访问客户端软件的网络运营商)必须理解并保障终端用户体验质量(QoE)。现有VCA QoE估计方法依赖于对应用层实时传输协议(RTP)头部的被动测量。然而,网络运营商并非总能获取RTP头部信息——例如当VCA使用自定义RTP协议(如Zoom)时,或受限于系统约束(如传统测量系统)。针对这一挑战,本文考虑利用网络流量中更通用的特征(即IP和UDP头部),以实现对帧率、视频分辨率等关键VCA QoE指标的逐秒估计。我们提出了一种机器学习方法,融合流统计特征(如吞吐量)与基于VCA视频帧分片机制导出的特征。我们在三种基于WebRTC的主流VCA(Google Meet、Microsoft Teams和Cisco Webex)上评估了该方法。评估数据集包含54,696秒的VCA数据,采集自(1)受控实验室网络环境与(2)15户家庭的真实网络环境。结果表明,尽管仅使用IP/UDP数据,该机器学习方法仍能达到与RTP方法相当的精度。例如,在真实网络数据中,该方法能在83.05%的单秒间隔内将帧率估计误差控制在2 FPS以内,比使用应用层RTP头部仅低1.76%。