The exponential growth in data has intensified the demand for computational power to train large-scale deep learning models. However, the rapid growth in model size and complexity raises concerns about equal and fair access to computational resources, particularly under increasing energy and infrastructure constraints. GPUs have emerged as essential for accelerating such workloads. This study benchmarks four deep learning models (Conv6, VGG16, ResNet18, CycleGAN) using TensorFlow and PyTorch on Intel Xeon CPUs and NVIDIA Tesla T4 GPUs. Our experiments demonstrate that, on average, GPU training achieves speedups ranging from 11x to 246x depending on model complexity, with lightweight models (Conv6) showing the highest acceleration (246x), mid-sized models (VGG16, ResNet18) achieving 51-116x speedups, and complex generative models (CycleGAN) reaching 11x improvements compared to CPU training. Additionally, in our PyTorch vs. TensorFlow comparison, we observed that TensorFlow's kernel-fusion optimizations reduce inference latency by approximately 15%. We also analyze GPU memory usage trends and projecting requirements through 2025 using polynomial regression. Our findings highlight that while GPUs are essential for sustaining AI's growth, democratized and shared access to GPU resources is critical for enabling research innovation across institutions with limited computational budgets.
翻译:数据呈指数级增长,加剧了训练大规模深度学习模型对计算能力的需求。然而,模型规模和复杂度的快速提升引发了关于计算资源公平获取的担忧,尤其是在能源与基础设施约束日益严峻的背景下。GPU已成为加速此类工作负载的关键工具。本研究在Intel Xeon CPU和NVIDIA Tesla T4 GPU上,基于TensorFlow和PyTorch框架对四种深度学习模型(Conv6、VGG16、ResNet18、CycleGAN)进行了基准测试。实验表明,与CPU训练相比,GPU训练的平均加速比因模型复杂度而异:轻量级模型(Conv6)加速效果最显著(246倍),中等规模模型(VGG16、ResNet18)实现51-116倍加速,而复杂生成模型(CycleGAN)的提升为11倍。此外,在PyTorch与TensorFlow的对比中,我们发现TensorFlow的内核融合优化可将推理延迟降低约15%。我们还分析了GPU内存使用趋势,并通过多项式回归预测了直至2025年的需求变化。研究结果强调,GPU虽对维持AI发展至关重要,但实现GPU资源的民主化共享与公平访问,对预算有限的研究机构推动创新具有关键意义。