The pursuit of high-performance data transfer often focuses on raw network bandwidth, where international links of 100 Gbps or higher are frequently considered the primary enabler. While necessary, this network-centric view is incomplete. It equates provisioned link speeds with practical, sustainable data movement capabilities. It is a common observation that lower-than-desired data rates manifest even on 10 Gbps links and commodity hardware, with higher-speed networks only amplifying their visibility. We investigate six paradigms -- from network latency and TCP congestion control to host-side factors such as CPU performance and virtualization -- that critically impact data movement workflows. These paradigms represent widely accepted engineering assumptions that inform system design, procurement decisions, and operational practices in production data movement environments. We introduce the Drainage Basin Pattern conceptual model for reasoning about end-to-end data flow constraints across heterogeneous hardware and software components at varying desired data rates to address the fidelity gap between raw bandwidth and application-level throughput. Our findings are validated through rigorous production-scale deployments, from 10 Gbps links to U.S. DOE ESnet technical evaluations and transcontinental production trials over 100 Gbps operational links. The results demonstrate that principal bottlenecks often reside outside the network core, and that a holistic hardware-software co-design enables consistent, predictable performance for moving data at scale and speed.
翻译:高性能数据传输的追求往往聚焦于原始网络带宽,其中100 Gbps或更高的国际链路常被视为主要赋能因素。尽管必要,这种以网络为中心的视角并不完整。它将配置的链路速度等同于实际可持续的数据传输能力。一个常见现象是,即使在10 Gbps链路和商用硬件上也会出现低于预期的数据速率,而更高速的网络只会放大这一现象的可见性。我们研究了六个关键影响数据传输工作流的范式——从网络延迟和TCP拥塞控制到主机端因素(如CPU性能和虚拟化)。这些范式代表了广泛接受的工程假设,指导着生产数据传输环境中的系统设计、采购决策和操作实践。我们引入了“流域模式”概念模型,用于推理跨异构硬件和软件组件在不同目标数据速率下的端到端数据流约束,以解决原始带宽与应用级吞吐量之间的保真度差距。我们的发现通过严格的生产规模部署得到验证,从10 Gbps链路到美国能源部ESnet技术评估,再到100 Gbps运营链路上的跨大陆生产试验。结果表明,主要瓶颈通常存在于网络核心之外,而硬件-软件协同设计的整体方法能够为大规模高速数据传输提供一致且可预测的性能。