GenAI chatbots are now pervasive in digital ecosystems, fundamentally reshaping user interactions over the Internet. Their reliance on an always-online, cloud-centric operating model introduces novel traffic dynamics that challenge practical network management. Despite the critical need to anticipate these changes in network demand, the traffic characterization of these chatbots remains largely underexplored. To fill this gap, this study presents an in-depth traffic analysis of ChatGPT, Copilot, and Gemini used via Android mobile apps. Using a dedicated capture architecture, we collect two complementary datasets, combining unconstrained user interactions with a controlled workload of selected prompts for both text and image generation. This dual design allows us to address practical research questions on the distinctiveness of chatbot traffic, its divergence from that of conventional messaging apps, and its novel implications for network usage. To this end, we provide a multi-granular traffic characterization and model packet-sequence dynamics to uncover the underlying transmission mechanisms. Our analysis reveals app-/content-specific traffic patterns and distinctive protocol footprints. We highlight the predominance of TLS, with Gemini extensively leveraging QUIC, ChatGPT exclusively using TLS 1.3, and characteristic Server Name Indication (SNI) values. Through occlusion analysis, we quantify the reliance on SNI for traffic visibility, demonstrating that masking this field reduces classification performance by up to 20 percentage points. Finally, the comparison with conventional messaging apps confirms that GenAI workloads introduce novel stress factors, such as sustained upstream activity and high-rate bursts, with direct implications for capacity planning and network management. We publicly release the datasets to support reproducibility and foster extensions to other use cases.
翻译:生成式AI聊天机器人现已普及于数字生态系统,从根本上改变了用户通过互联网交互的方式。其依赖始终在线、云端核心的运行模式,引入了新型流量动态,给实际网络管理带来挑战。尽管迫切需要预测网络需求的变化,但这些聊天机器人的流量特征分析仍鲜有研究。为填补这一空白,本研究对通过安卓移动应用使用的ChatGPT、Copilot和Gemini进行了深入流量分析。通过专用捕获架构,我们收集了两组互补数据集,将无约束的用户交互与包含文本和图像生成选定提示词的可控工作负载相结合。这种双重设计使我们能够解决关于聊天机器人流量的独特性、其与传统即时通讯应用流量的差异,以及对网络使用产生的新影响的实证研究问题。为此,我们提供多粒度流量特征分析,并建模数据包序列动态以揭示底层传输机制。分析揭示了应用/内容特定的流量模式和独特的协议特征。我们强调了TLS的主导地位,其中Gemini广泛使用QUIC,ChatGPT仅使用TLS 1.3,并展示了典型服务器名称指示(SNI)值。通过遮挡分析,我们量化了SNI对流量可见性的依赖,证明屏蔽该字段会导致分类性能下降高达20个百分点。最后,与传统即时通讯应用的比较证实,生成式AI工作负载引入了新的压力因素,例如持续的上行活动和高速率突发,对容量规划和网络管理具有直接影响。我们公开发布数据集以支持可重复性,并促进向其他用例的扩展。