Given a basic block of instructions, finding a schedule that requires the minimum number of registers for evaluation is a well-known problem. The problem is NP-complete when the dependences among instructions form a directed-acyclic graph instead of a tree. We are striving to find efficient approximation algorithms for this problem not simply because it is an interesting graph optimization problem in theory. A good solution to this problem is also an essential component in solving the more complex instruction scheduling problem on GPU. In this paper, we start with explanations on why this problem is important in GPU instruction scheduling. We then explore two different approaches to tackling this problem. First we model this problem as a constraint-programming problem. Using a state-of-the-art CP-SAT solver, we can find optimal answers for much larger cases than previous works on a modest desktop PC. Second, guided by the optimal answers, we design and evaluate heuristics that can be applied to the polynomial-time list scheduling algorithms. A combination of those heuristics can achieve the register-pressure results that are about 17\% higher than the optimal minimum on average. However, there are still near 6\% cases in which the register pressure by the heuristic approach is 50\% higher than the optimal minimum.
翻译:给定一个基本块指令集合,寻找需要最少寄存器进行求值的调度是一个经典问题。当指令间的依赖关系形成有向无环图而非树结构时,该问题具有NP完全性。我们致力于为此问题寻找高效的近似算法,这不仅因为它在理论上是一个有趣的图优化问题,更因为该问题的优质解决方案是解决GPU上更复杂的指令调度问题的关键组成部分。本文首先阐释了该问题在GPU指令调度中的重要性,随后探索了两种不同的求解途径。第一种方法将问题建模为约束规划问题,通过采用最先进的CP-SAT求解器,能在普通台式机上针对远超以往工作规模的案例求出最优解。第二种方法以最优解为指引,设计并评估了可应用于多项式时间列表调度算法的启发式策略。综合运用这些启发式策略,所得寄存器压力平均仅比理论最优最小值高17%。但仍有近6%的案例中,启发式方法的寄存器压力比最优最小值高出50%。