QUIC, an increasingly adopted transport protocol, addresses limitations of TCP by offering improved security, performance, and features such as stream multiplexing and connection migration. However, these enhancements also introduce challenges for network operators in monitoring and analyzing web traffic, especially due to QUIC's encryption. Existing datasets are inadequate they are often outdated, lack diversity, anonymize critical information, or exclude essential features like SSL keys-limiting comprehensive research and development in this area. We introduce VisQUIC, a publicly available dataset of over 100,000 labeled QUIC traces with corresponding SSL keys, collected from more than 40,000 websites over four months. By generating visual representations of the traces, we facilitate advanced machine learning (ML) applications and in-depth analysis of encrypted QUIC traffic. To demonstrate the dataset's potential, we estimate the number of HTTP3 request-response pairs in a QUIC connection using only encrypted traffic, achieving up to 92% accuracy. This estimation provides insights into server behavior, client-server interactions, and connection load-crucial for tasks like load balancing and intrusion detection. Our dataset enables comprehensive studies on QUIC and HTTP/3 protocols and supports the development of tools for encrypted traffic analysis.
翻译:QUIC作为一种日益普及的传输协议,通过提供增强的安全性、性能以及流多路复用和连接迁移等功能,解决了TCP的局限性。然而,这些改进也给网络运营商监控和分析网络流量带来了挑战,特别是由于QUIC的加密特性。现有数据集存在不足:它们往往过时、缺乏多样性、对关键信息进行匿名化处理,或排除了SSL密钥等重要特征——这限制了该领域的全面研究和开发。我们推出了VisQUIC,这是一个包含超过10万条带标签QUIC流量轨迹及对应SSL密钥的公开数据集,数据采集自超过4万个网站,历时四个月。通过生成流量轨迹的可视化表征,我们促进了加密QUIC流量的高级机器学习应用和深度分析。为展示该数据集的潜力,我们仅使用加密流量估算了QUIC连接中的HTTP/3请求-响应对数量,准确率最高可达92%。该估算为理解服务器行为、客户端-服务器交互及连接负载提供了重要见解——这些对于负载均衡和入侵检测等任务至关重要。我们的数据集支持对QUIC和HTTP/3协议的全面研究,并助力加密流量分析工具的研发。