The behavior of Internet applications is shaped by congestion dynamics at bottleneck links, yet data capturing application behavior across diverse bottleneck regimes remains scarce. Bridging this gap requires a data-generation substrate that simultaneously provides controllability, composability, fidelity, and replicability--capabilities existing approaches struggle to achieve simultaneously. This paper introduces NetForge, a programmable substrate for bottleneck-centric data generation guided by progressive disaggregation: NetForge (i) decouples bottleneck intent from execution, (ii) separates static bottleneck attributes from dynamic congestion pressure, and (iii) disaggregates observed demand dynamics from their original trace context via Cross-Traffic Profiles (CTPs). CTPs transform passive packet traces into reusable, composable pressure signals that can be selected and transformed to specify dynamic bottleneck behavior. Our evaluation shows that NetForge satisfies the four requirements and, in an ABR case study, generates data that remains realistic, expands coverage into underrepresented regimes, and, in turn, improves model performance by up to 47% by reducing transmission-time prediction error of the Fugu model. Together, these results establish NetForge as a practical substrate for studying Internet application behavior across diverse bottleneck regimes.
翻译:互联网应用的行为受制于瓶颈链路的拥塞动态,然而能够捕捉不同瓶颈机制下应用行为的数据仍然匮乏。弥合这一鸿沟需要一种能同时提供可控性、可组合性、保真度与可复现性的数据生成基础架构——现有方法难以同时实现这些能力。本文提出NetForge,一种基于渐进解耦思想的瓶颈中心数据生成可编程基础架构:NetForge(i)将瓶颈意图与执行解耦,(ii)分离静态瓶颈属性与动态拥塞压力,(iii)通过跨流量剖面将观测到的需求动态从其原始轨迹上下文中解耦。跨流量剖面将被动数据包轨迹转化为可复用、可组合的压力信号,这些信号可通过选择与变换来指定动态瓶颈行为。评估结果表明,NetForge满足上述四项要求,并在自适应码率案例研究中生成的数据保持真实性,扩展了对代表性不足机制的覆盖范围,进而通过将Fugu模型的传输时间预测误差降低达47%,提升了模型性能。这些成果共同确立了NetForge作为研究互联网应用跨不同瓶颈机制行为的实用基础架构。