Last-mile access networks are often the dominant bottlenecks for Internet applications, creating demand for data-generation approaches that are both realistic and reusable. Meeting this goal requires five properties: fidelity (capturing real network behaviors), controllability (systematic variation of network conditions), diversity (coverage of heterogeneous network behaviors), composability (construction of complex scenarios from simpler elements), and replicability (consistent outcomes across runs). Existing approaches satisfy only a subset of these requirements. This paper introduces NETREPLICA, a programmable substrate for last-mile data generation that achieves all five. NETREPLICA decomposes bottlenecks into static attributes (capacity, base latency, buffer size, shaping and active queue management policies) and dynamic attributes derived from passive traces. It introduces Cross-Traffic Profiles (CTPs) that transform passive production traces into reusable, parameterizable building blocks. By trimming, scaling, and recombining CTPs, NETREPLICA generates realistic yet tunable conditions, replaying non-reactive cross traffic alongside reactive application workloads and enabling reproducible construction of heterogeneous scenarios. In a case study on adaptive bitrate streaming, models trained with NETREPLICA-generated traces reduced transmission-time prediction error by up to 47% in challenging slow-path domains (>=400 ms RTT, <=6 Mbps throughput) compared to models trained solely on production traces -- demonstrating the utility of NETREPLICA-generated data. Overall, NETREPLICA represents a first step toward a fully programmable data-generation substrate for networking.
翻译:最后一英里接入网络通常是互联网应用的主要瓶颈,这催生了对既真实又可复用的数据生成方法的需求。实现这一目标需要具备五个特性:保真度(捕捉真实网络行为)、可控性(系统性地改变网络条件)、多样性(覆盖异构网络行为)、可组合性(从简单元素构建复杂场景)以及可复现性(多次运行结果一致)。现有方法仅能满足这些要求的一部分。本文介绍了NETREPLICA,一种用于最后一英里数据生成的可编程基板,它同时实现了所有五个特性。NETREPLICA将瓶颈分解为静态属性(容量、基础延迟、缓冲区大小、整形和主动队列管理策略)以及从被动追踪数据中提取的动态属性。它引入了跨流量配置文件(CTPs),将被动生产追踪数据转化为可复用、可参数化的构建模块。通过对CTPs进行修剪、缩放和重组,NETREPLICA能够生成真实且可调节的网络条件,在反应式应用工作负载旁重放非反应式跨流量,并支持异构场景的可复现构建。在自适应比特率流媒体的案例研究中,与仅使用生产追踪数据训练的模型相比,使用NETREPLICA生成追踪数据训练的模型在具有挑战性的慢路径域(RTT≥400毫秒,吞吐量≤6 Mbps)中,将传输时间预测误差降低了高达47%——这证明了NETREPLICA生成数据的实用性。总体而言,NETREPLICA代表了迈向完全可编程网络数据生成基板的第一步。