Given a basic block of instructions, finding a schedule that requires the minimum number of registers for evaluation is a well-known problem. The problem is NP-complete when the dependences among instructions form a directed-acyclic graph instead of a tree. We are striving to find efficient approximation algorithms for this problem not simply because it is an interesting graph optimization problem in theory. A good solution to this problem is also an essential component in solving the more complex instruction scheduling problem on GPU. In this paper, we start with explanations on why this problem is important in GPU instruction scheduling. We then explore two different approaches to tackling this problem. First we model this problem as a constraint-programming problem. Using a state-of-the-art CP-SAT solver, we can find optimal answers for much larger cases than previous works on a modest desktop PC. Second, guided by the optimal answers, we design and evaluate heuristics that can be applied to the polynomial-time list scheduling algorithms. A combination of those heuristics can achieve the register-pressure results that are about 17\% higher than the optimal minimum on average. However, there are still near 6\% cases in which the register pressure by the heuristic approach is 50\% higher than the optimal minimum.
翻译:给定一个指令基本块,寻找所需寄存器数目最少的调度方案是一个众所周知的问题。当指令间的依赖关系构成有向无环图而非树结构时,该问题是NP完全的。我们致力于寻找该问题的高效近似算法,这不仅因为它在理论上是一个有趣的图优化问题,更因为该问题的良好解是解决GPU上更复杂指令调度问题的关键组成部分。本文首先阐释了该问题在GPU指令调度中的重要性,随后探索了两种不同的解决方案。首先,我们将该问题建模为约束满足问题。通过使用先进的CP-SAT求解器,在普通台式计算机上,我们能够找到比以往研究更大规模案例的最优解。其次,在最优解的指导下,我们设计并评估了可应用于多项式时间列表调度算法的启发式策略。这些启发式策略的组合可使寄存器压力平均仅比最优最小值高出约17%。然而,仍有近6%的案例中,启发式方法产生的寄存器压力比最优最小值高出50%。