In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions. By using OpenATLib, we develop a fully auto-tuned sparse iterative solver named Xabclib. Xabclib has several novel run-time AT functions. First, the following new implementations of sparse matrix-vector multiplication (SpMV) for thread processing are implemented:(1) non-zero elements; (2) omission of zero-elements computation for vector reduction; (3) branchless segmented scan (BSS). According to the performance evaluation and the comparison with conventional implementations, the following results are obtained: (1) 14x speedup for non-zero elements and zero-elements computation omission for symmetric SpMV; (2) 4.62x speedup by using BSS. We also develop a "numerical computation policy" that can optimize memory space and computational accuracy. Using the policy, we obtain the following: (1) an averaged 1/45 memory space reduction; (2) avoidance of the "fault convergence" situation, which is a problem of conventional solvers.
翻译:本文提出了一种名为OpenATLib的通用应用程序接口,用于自动调优(AT)。OpenATLib的设计旨在建立AT功能的可复用性。通过使用OpenATLib,我们开发了名为Xabclib的全自动调优稀疏迭代求解器。Xabclib具备多项新颖的运行时AT功能。首先,针对线程处理实现了以下新型稀疏矩阵-向量乘法(SpMV)实现方法:(1)非零元素优化;(2)向量归约中零元素计算的省略;(3)无分支分段扫描(BSS)。经性能评估及与传统实现的对比,获得以下结果:(1)对称SpMV中非零元素优化与零元素计算省略实现了14倍加速;(2)采用BSS实现了4.62倍加速。此外,我们开发了可优化内存空间与计算精度的"数值计算策略"。应用该策略后,获得了以下结果:(1)平均内存空间缩减至原来的1/45;(2)避免了"错误收敛"情况,这是传统求解器存在的问题。