We present SPDL (Scalable and Performant Data Loading), an open-source, framework-agnostic library designed for efficiently loading array data to GPU. Data loading is often a bottleneck in AI applications, and is challenging to optimize because it requires coordination of network calls, CPU-bound tasks, and GPU device transfer. On top of that, Python's GIL (Global Interpreter Lock) makes it difficult to gain performance improvement from multi-threading. We found that when data preprocessing functions release the GIL entirely, it is possible to execute them concurrently in a thread pool, thereby improving the workflow performance. Our benchmark shows that compared to the PyTorch DataLoader, SPDL can iterate through the ImageNet dataset 74% faster while using 38% less CPU and 50GB less memory. When training ViT-B/16 model, SPDL can send data to the GPU at a speed that does not starve the training. Additionally, when using SPDL on Python 3.13t, without changing any code, the throughput is further by improved by 33%, thanks to the disabled GIL. SPDL can improve the performance of current AI model training, and receives further performance improvements when Free-Threaded Python is adopted in production systems. SPDL is available at https://github.com/facebookresearch/spdl.
翻译:本文介绍SPDL(可扩展高性能数据加载库),这是一个开源、框架无关的库,专为高效向GPU加载数组数据而设计。数据加载常成为AI应用中的性能瓶颈,其优化具有挑战性,因为它需要协调网络调用、CPU密集型任务和GPU设备传输。此外,Python的全局解释器锁使得难以通过多线程获得性能提升。我们发现,当数据预处理函数完全释放GIL时,可以在线程池中并发执行这些函数,从而提升工作流性能。基准测试表明:与PyTorch DataLoader相比,SPDL遍历ImageNet数据集的速度提升74%,同时减少38%的CPU占用和50GB内存消耗。在训练ViT-B/16模型时,SPDL能以不拖慢训练节奏的速度向GPU输送数据。此外,在Python 3.13t上使用SPDL时,无需修改任何代码即可因GIL禁用而额外获得33%的吞吐量提升。SPDL能够提升现有AI模型训练的性能,并在生产系统采用无锁线程Python时获得进一步的性能增益。SPDL代码库位于https://github.com/facebookresearch/spdl。