Given a basic block of instructions, finding a schedule that requires the minimum number of registers for evaluation is a well-known problem. The problem is NP-complete when the dependences among instructions form a directed-acyclic graph instead of a tree. We are striving to find efficient approximation algorithms for this problem not simply because it is an interesting graph optimization problem in theory. A good solution to this problem is also an essential component in solving the more complex instruction scheduling problem on GPU. In this paper, we start with explanations on why this problem is important in GPU instruction scheduling. We then explore two different approaches to tackling this problem. First we model this problem as a constraint-programming problem. Using a state-of-the-art CP-SAT solver, we can find optimal answers for much larger cases than previous works on a modest desktop PC. Second, guided by the optimal answers, we design and evaluate heuristics that can be applied to the polynomial-time list scheduling algorithms. A combination of those heuristics can achieve the register-pressure results that are about 16\% higher than the optimal minimum on average. However, there are still near 3\% cases in which the register pressure by the heuristic approach is 50\% higher than the optimal minimum.
翻译:给定一个基本块指令集合,寻找需要最少寄存器进行评估的调度方案是一个经典问题。当指令间的依赖关系形成有向无环图而非树结构时,该问题是NP完全的。我们致力于寻找该问题的高效近似算法,这不仅因为它在理论上是一个有趣的图优化问题,更因为该问题的良好解决方案是解决GPU上更为复杂的指令调度问题的关键组成部分。本文首先阐述了该问题在GPU指令调度中的重要性,随后探讨了解决该问题的两种不同方法。其一,我们将该问题建模为约束规划问题,通过使用最先进的CP-SAT求解器,可在普通台式机上为比以往研究大得多的案例找到最优解。其二,在最优解的指导下,我们设计并评估了可应用于多项式时间列表调度算法的启发式策略。这些启发式策略的组合平均可实现比最优最小值高约16%的寄存器压力结果,但仍有近3%的案例中启发式方法的寄存器压力比最优最小值高出50%。