The pursuit of high-performance data transfer often focuses on raw network bandwidth, where international links of 100 Gbps or higher are frequently considered the primary enabler. While necessary, this network-centric view is incomplete. It equates provisioned link speeds with practical, sustainable data movement capabilities. It is a common observation that lower-than-desired data rates manifest even on 10 Gbps links and commodity hardware, with higher-speed networks only amplifying their visibility. We investigate six paradigms -- from network latency and TCP congestion control to host-side factors such as CPU performance and virtualization -- that critically impact data movement workflows. These paradigms represent widely accepted engineering assumptions that inform system design, procurement decisions, and operational practices in production data movement environments. We introduce the Drainage Basin Pattern conceptual model for reasoning about end-to-end data flow constraints across heterogeneous hardware and software components at varying desired data rates to address the fidelity gap between raw bandwidth and application-level throughput. Our findings are validated through rigorous production-scale deployments, from 10 Gbps links to U.S. DOE ESnet technical evaluations and transcontinental production trials over 100 Gbps operational links. The results demonstrate that principal bottlenecks often reside outside the network core, and that a holistic hardware-software co-design enables consistent, predictable performance for moving data at scale and speed.
翻译:高性能数据传输的追求往往聚焦于原始网络带宽,其中100 Gbps或更高的国际链路常被视为主要使能因素。尽管这一网络中心视角不可或缺,却并不全面——它简单地将已配置的链路速度等同于实际可维持的数据移动能力。一个普遍观察是,即使在10 Gbps链路与商用硬件上,低于预期的数据传输速率也屡见不鲜,而高速网络只是放大了这一现象的可见性。我们研究了六种关键范式——从网络延迟、TCP拥塞控制,到CPU性能、虚拟化等主机侧因素——它们对数据移动工作流产生关键影响。这些范式代表了广泛接受的工程假设,指导着生产环境数据移动中的系统设计、采购决策与运维实践。为弥合原始带宽与应用层吞吐量之间的保真度差距,我们提出了"流域盆地模式"概念模型,用于推理跨异构软硬件组件的端到端数据流约束,并支持不同目标数据速率下的分析。我们的发现通过严格的生产级部署得到验证:从10 Gbps链路到美国能源部ESnet技术评估,再到跨大陆100 Gbps运营链路的实际生产试验。结果表明,主要瓶颈通常位于网络核心之外,而硬件-软件协同设计能够实现大规模、高速数据移动下的一致性与可预测性能。