The quality of the user experience has become one of the most important aspects in todays world, as it directly influences individuals willingness to continue using or abandon a product or service. In this context, video conferencing applications (VCAs), which experienced widespread adoption following the COVID-19 pandemic, must deliver excellent performance to remain competitive in an increasingly crowded market. Although content providers (CPs) such as Zoom, WhatsApp, Telegram, and Google Meet can assess conversation quality by comparing transmitted and received data. The widespread use of end-to-end encryption in VCAs makes quality-of-experience (QoE) evaluation by internet service providers (ISPs) far more challenging. Since ISPs do not have access to the encrypted content, they must rely on passive measurements of unencrypted traffic characteristics on the data path. In this work, we present a simple yet effective QoE prediction framework based on an almost stock convolutional neural network (CNN) architecture that uses only the packet sizes extracted from the communication between two participants in a video conferencing (VC) call to predict two QoE metrics: BRISQUE and MOS. The proposed framework is simple, easy to implement, and does not require high-end computational resources, yet it provides superior prediction performance, as shown in our experiments on two custom datasets collected from WhatsApp and Zoom, which achieve substantial improvements over previous models for the QoE prediction task.
翻译:用户体验质量已成为当今世界最重要的方面之一,因为它直接影响用户继续使用或放弃产品或服务的意愿。在此背景下,视频会议应用在新冠疫情后得到广泛普及,必须在日益拥挤的市场中提供卓越性能以保持竞争力。尽管Zoom、WhatsApp、Telegram和Google Meet等内容提供商可以通过比较发送和接收的数据来评估对话质量,但视频会议应用中端到端加密的广泛使用使得互联网服务提供商对体验质量的评估变得更具挑战性。由于互联网服务提供商无法访问加密内容,他们必须依赖数据路径上未加密流量特征的被动测量。在本研究中,我们提出了一种简单而有效的QoE预测框架,该框架基于几乎标准配置的卷积神经网络架构,仅使用视频会议通话中两方通信提取的数据包大小来预测两个QoE指标:BRISQUE和MOS。所提出的框架简单、易于实现且无需高端计算资源,但在我们在WhatsApp和Zoom收集的两个自定义数据集上的实验表明,该框架在QoE预测任务上相较于先前模型实现了显著改进。