Given a basic block of instructions, finding a schedule that requires the minimum number of registers for evaluation is a well-known problem. The problem is NP-complete when the dependences among instructions form a directed-acyclic graph instead of a tree. We are striving to find efficient approximation algorithms for this problem not simply because it is an interesting graph optimization problem in theory. A good solution to this problem is also an essential component in solving the more complex instruction scheduling problem on GPU. In this paper, we start with explanations on why this problem is important in GPU instruction scheduling. We then explore two different approaches to tackling this problem. First we model this problem as a constraint-programming problem. Using a state-of-the-art CP-SAT solver, we can find optimal answers for much larger cases than previous works on a modest desktop PC. Second, guided by the optimal answers, we design and evaluate heuristics that can be applied to the polynomial-time list scheduling algorithms. A combination of those heuristics can achieve the register-pressure results that are about 17\% higher than the optimal minimum on average. However, there are still near 6\% cases in which the register pressure by the heuristic approach is 50\% higher than the optimal minimum.
翻译:给定一个指令基本块,找出评估所需寄存器数量最少的调度方案是一个经典问题。当指令间的依赖关系形成有向无环图(而非树结构)时,该问题是NP完全的。我们致力于寻找该问题的高效近似算法,不仅因为它在理论上是一个有趣的图优化问题,而且该问题的良好解决方案也是解决GPU上更复杂指令调度问题的关键组成部分。本文首先阐述了该问题在GPU指令调度中的重要性,随后探讨了两种不同的求解方法。首先,我们将该问题建模为约束规划问题,利用最先进的CP-SAT求解器,在普通台式电脑上即可处理比以往研究工作更大的案例并找到最优解。其次,在最优解的指导下,我们设计并评估了可应用于多项式时间列表调度算法的启发式策略。这些启发式策略的组合平均可实现比最优最小值高约17%的寄存器压力结果,但在近6%的案例中,启发式方法产生的寄存器压力仍比最优最小值高50%。