Low Autocorrelation Binary Sequences (LABS) is a particularly challenging binary optimization problem which quickly becomes intractable in finding the global optimum for problem sizes beyond 66. This aspect makes LABS appealing to use as a test-bed for meta-heuristic optimization solvers to target large problem sizes. In this work, we introduce a massively parallelized implementation of the memetic tabu search algorithm to tackle LABS problem for sizes up to 120. By effectively combining the block level and thread level parallelism framework within a single Nvidia-A100 GPU, and creating hyper optimized binary-valued data structures for shared memory among the blocks, we showcase up to 26 fold speedup compared to the analogous 16-core CPU implementation. Our implementation has also enabled us to find new LABS merit factor values for twelve different problem sizes between 92 and 118. Crucially, we also showcase improved values for two odd-sized problems {99, 107} whose previous best known results coincided with the provably optimal skew-symmetric search sequences. Consequently, our result highlights the importance of a focus on general-purpose solver to tackle LABS, since leveraging its skew-symmetry could lead to sub-optimal solutions.
翻译:低自相关二进制序列(LABS)是一个极具挑战性的二进制优化问题,当问题规模超过66时,寻找全局最优解的计算复杂度会急剧增加,导致问题难以处理。这一特性使得LABS成为元启发式优化求解器针对大规模问题测试的理想基准。在本研究中,我们提出了一种大规模并行化的模因禁忌搜索算法实现,用于处理规模高达120的LABS问题。通过有效结合单个Nvidia-A100 GPU内的块级与线程级并行框架,并为块间共享内存创建了高度优化的二进制数值数据结构,我们展示了相比类似的16核CPU实现高达26倍的加速比。我们的实现还使我们能够为92至118之间的十二个不同问题规模找到新的LABS品质因数数值。关键的是,我们还展示了两个奇数规模问题{99, 107}的改进结果,这两个问题先前的最佳已知结果与可证明最优的斜对称搜索序列一致。因此,我们的结果凸显了专注于通用求解器处理LABS问题的重要性,因为仅依赖其斜对称性可能导致次优解。