QUIC, a new and increasingly used transport protocol, addresses and resolves the limitations of TCP by offering improved security, performance, and features such as stream multiplexing and connection migration. These features, however, also present challenges for network operators who need to monitor and analyze web traffic. In this paper, we introduce VisQUIC, a labeled dataset comprising over 100,000 QUIC traces from more than 44,000 websites (URLs), collected over a four-month period. These traces provide the foundation for generating more than seven million images, with configurable parameters of window length, pixel resolution, normalization, and labels. These images enable an observer looking at the interactions between a client and a server to analyze and gain insights about QUIC encrypted connections. To illustrate the dataset's potential, we offer a use-case example of an observer estimating the number of HTTP/3 responses/requests pairs in a given QUIC, which can reveal server behavior, client--server interactions, and the load imposed by an observed connection. We formulate the problem as a discrete regression problem, train a machine learning (ML) model for it, and then evaluate it using the proposed dataset on an example use case.
翻译:QUIC作为一种新兴且日益普及的传输协议,通过提供增强的安全性、性能以及流多路复用与连接迁移等功能,解决并突破了TCP的局限性。然而,这些特性也为需要监控和分析网络流量的运营商带来了挑战。本文介绍了VisQUIC——一个包含超过10万条QUIC流量的标注数据集,这些数据采集自超过4.4万个网站(URL),收集周期长达四个月。这些流量轨迹为生成超过700万张可配置参数(包括窗口长度、像素分辨率、归一化处理和标签)的图像提供了基础。这些图像使观察者能够通过分析客户端与服务器之间的交互,深入理解QUIC加密连接的特性。为展示该数据集的潜力,我们提供了一个用例示例:观察者通过估计特定QUIC连接中HTTP/3请求-响应对的数量,可揭示服务器行为、客户端-服务器交互模式以及被观测连接所产生的负载。我们将该问题形式化为离散回归问题,并训练了相应的机器学习模型,最后通过所提出的数据集在示例用例中对该模型进行了评估。