Hypergraph partitioning is a recurring NP-hard problem in engineering; its efficient solution at scale hinges on parallelism. This work proposes a GPU-centric algorithm for multi-level hypergraph partitioning aimed at a specific set of problem constraints: limited size and distinct inbound hyperedges per partition. Manipulating hypergraphs requires deeply nested traversals and concurrent decision-making; our constraints impose further set operations amidst that. In turn, we design algorithms around the GPU's hierarchical parallelism and our problem's specifics. When forming partitions, we materialize the hypergraph's incidence structure and unique neighborhoods in memory to exploit set sparsity and batch node-pairing scores in shared memory. Upon refining partitions, we chain node moves into improving paths and cycles, checking their validity via cumulative set size variations reduced in parallel over moves. Thus, our dominant kernels exhibit a span linear in local hypergraph parameters. Results show an average 380x speedup and a 1.2-2.0x reduction in connectivity compared to a sequential multi-level partitioner. With minor changes, we also support k-way balanced partitioning, running 5x faster than CPU methods with a ~5% quality loss for k=2, outperforming an existing GPU partitioner at comparable runtime, with no measurable overhead from the added constraints handling logic.
翻译:超图分割是工程中反复出现的NP难问题,其大规模高效求解依赖于并行化。本文提出了一种面向特定约束条件集(有限规模与各分区内不同入射超边)的GPU中心化多级超图分割算法。超图操作需要深度嵌套的遍历与并发决策,而我们的约束条件进一步在此过程中引入了集合运算。据此,我们围绕GPU的层次化并行特性与问题的具体特征设计算法:在构建分区时,通过内存中实例化超图的关联结构与唯一邻域,利用集合稀疏性并基于共享内存批处理节点配对评分;在优化分区时,将节点移动串联为改进路径与环,通过累积集合规模变化的并行归约验证其有效性。由此,核心内核的计算复杂度在局部超图参数下呈线性关系。实验结果表明,与串行多级分割器相比,本方案平均加速比达380倍,连通性降低1.2-2.0倍。经微小修改后,本方案还可支持k-way均衡分割:当k=2时,运行速度较CPU方法提升5倍,质量损失约5%,且在相同运行时间内优于现有GPU分割器,而处理约束逻辑未带来可测量的额外开销。