Binary size reduction is an increasingly important optimization objective for compilers. One emerging technique is function merging, where multiple similar functions are merged into one, thereby eliminating redundancy. The SOTA approach to perform the merging is based on sequence alignment, where functions are viewed as linear sequences of instructions that are then matched in a way maximizing their alignment. In this paper, we consider a significantly generalized formulation of the problem by allowing reordering of branches within each function, subsequently allowing for more flexible matching and better merging. We show that this makes the problem NP-hard, and thus we study it through the lens of parameterized algorithms and complexity, where we identify certain parameters of the input that govern its complexity. We look at two natural parameters: the branching factor and nesting depth of input functions. Concretely, our input consists of two functions $F_1, F_2,$ where each $F_i$ has size $n_i,$ branching factor $b_i,$ and nesting depth $d_i.$ Our task is to reorder the branches of $F_1$ and $F_2$ in a way that yields linearizations achieving the maximum sequence alignment. Let $n=\max(n_1, n_2),$ and define $b, d$ similarly. Our results are as follows: - A simple algorithm running in time $2^{O(bd)} n^2,$ establishing that the problem is fixed-parameter tractable (FPT) with respect to all four parameters $b_1,d_1, b_2, d_2.$ - An algorithm running in time $2^{O(bd_2)} n^7,$ showing that even when one of the functions has an unbounded nesting depth, the problem remains in FPT. - A hardness result showing that the problem is NP-hard even when constrained to constant $d_1, b_2, d_2.$ To the best of our knowledge, this is the first systematic study of function merging with branch reordering from an algorithmic or complexity-theoretic perspective.
翻译:二进制文件体积缩减已成为编译器日益重要的优化目标。一种新兴技术是函数合并,即合并多个相似函数为一个,从而消除冗余。实现合并的现有最优方法基于序列比对,将函数视为指令的线性序列,并通过最大化序列比对的方式进行匹配。本文通过允许在每个函数内部重排分支,提出了该问题的一种显著泛化版本,从而允许更灵活的匹配和更优的合并。我们证明该问题为NP难问题,因此通过参数化算法与复杂性的视角进行研究,识别出输入中影响其复杂性的若干参数。我们关注两个自然参数:输入函数的分支因子和嵌套深度。具体而言,输入为两个函数$F_1, F_2$,其中每个$F_i$的大小为$n_i$,分支因子为$b_i$,嵌套深度为$d_i$。我们的任务是通过重排$F_1$和$F_2$的分支,使其线性化后达到最大序列比对。设$n=\max(n_1, n_2)$,类似定义$b, d$。主要结果如下:
- 一个运行时间为$2^{O(bd)} n^2$的简单算法,证明该问题相对于所有四个参数$b_1,d_1, b_2, d_2$是固定参数可解的。
- 一个运行时间为$2^{O(bd_2)} n^7$的算法,表明即使其中一个函数具有无界嵌套深度,该问题仍属于固定参数可解类。
- 一个困难性结果,证明即使将$d_1, b_2, d_2$约束为常数,该问题仍为NP难问题。
据我们所知,这是从算法或复杂性理论角度对带分支重排的函数合并问题的首次系统性研究。