Integer Linear Programming (ILP) is widely used for solving real-world optimization problems, including network routing, map routing, and traffic scheduling. However, ILP algorithms are sparse and branch-intensive, making them inefficient on conventional CPUs and GPUs. Prior work has shown that large-scale ILP problems can require tens of hours of execution time even on massively parallel systems, limiting their applicability to time-sensitive decision-making workloads. Existing ILP solvers such as Gurobi employ software-level optimizations to handle sparsity on CPUs, but still face throughput limitations. GPU-based ILP solvers are also constrained because GPUs are not well suited for sparse and branch-heavy workloads, leading to thread divergence, under-utilization of streaming multiprocessors, and frequent host-device interactions. This paper presents SPARK, a sparsity-aware, reuse-aware, energy-efficient, reconfigurable near-cache ILP accelerator. SPARK repurposes the existing L1 cache in CPUs to provide near-cache acceleration with minimal hardware overhead of approximately 1.4\% of the CPU area. The architecture performs near-cache sparsity detection and sparsity-aware computation to reduce insignificant computations and data movement energy. SPARK also exploits computational reuse patterns in ILP algorithms to improve parallelism and efficiency. The proposed design supports both sparse and dense ILPs as well as Linear Programs (LPs). Evaluations on real-world workloads from MIPLIB 2017 show that SPARK achieves up to 15x and 20x performance improvement, and up to 152x and 740x energy reduction compared to AMD Zen3 CPUs and NVIDIA Tesla V100 GPUs, respectively, for sparse ILPs. For sparse LPs, SPARK achieves 7-17x performance improvement and 103-250x energy reduction over CPU and GPU baselines, demonstrating the broad applicability of the proposed architecture.
翻译:整数线性规划(ILP)广泛应用于求解现实世界优化问题,包括网络路由、地图路由和流量调度。然而,ILP算法具有稀疏性和分支密集型特点,使其在传统CPU和GPU上效率低下。先前研究表明,大规模ILP问题即使在大规模并行系统上也需数小时执行时间,限制了其在时间敏感型决策场景中的应用。现有ILP求解器(如Gurobi)虽采用软件级优化处理CPU上的稀疏性,但吞吐量仍受制约。基于GPU的ILP求解器亦受限于GPU对稀疏性和分支密集型负载的适应性不足,导致线程分歧、流多处理器利用率低下及频繁的主机-设备交互。本文提出SPARK——一种稀疏感知、重用感知、高能效且可重构的近缓存ILP加速器。SPARK复用CPU中现有L1缓存实现近缓存加速,硬件开销仅为CPU面积的1.4%。该架构通过近缓存稀疏性检测与稀疏感知计算,减少无效计算与数据移动能耗。同时,SPARK利用ILP算法中的计算重用模式提升并行性与效率。该设计支持稀疏与稠密ILP问题及线性规划(LP)。基于MIPLIB 2017真实工作负载的评估表明:对于稀疏ILP,相较AMD Zen3 CPU与NVIDIA Tesla V100 GPU,SPARK分别实现最高15倍与20倍性能提升,以及最高152倍与740倍能耗降低;对于稀疏LP,相较CPU与GPU基线,SPARK实现7-17倍性能提升与103-250倍能耗降低,充分展示了所提架构的广泛适用性。