Finding deletion-correcting codes of maximum size has been an open problem for over 70 years, even for a single deletion. We adapt FunSearch, a large language model (LLM)-guided evolutionary search, to discover functions that construct deletion-correcting codes at short code lengths. For a single deletion, our search finds a function that we prove constructs the conjectured-optimal Varshamov-Tenengolts code. For multiple deletions and quaternary edit codes, the discovered functions improve on prior explicit, search-based, and neural constructions but remain empirical heuristics without new theoretical insights. We study design choices for LLM-guided evolutionary search and find that, for our problem, compute is better allocated to sampling more functions than to longer reasoning traces per function, and that co-evolving natural language descriptions with code hurts search quality. We propose deduplicating logically identical functions during evolution, which we find critical for search diversity. Our results demonstrate the potential of LLM-guided evolutionary search for information theory and code design and represent the first application of such methods for constructing error-correcting codes. However, in our current formulation, evaluating a function scales exponentially with code length, limiting the approach to short codes.
翻译:寻找最大尺寸的删除校正码已是一个困扰学界超过70年的开放性问题,即便仅针对单次删除操作也是如此。我们改进了FunSearch(一种大语言模型引导的进化搜索方法),用于发现能构建短码长删除校正码的函数。针对单次删除,我们的搜索发现了一个函数,经证明该函数可构建推测最优的Varshamov-Tenengolts码。针对多次删除和四进制编辑码,所发现的函数改进了先前显式、基于搜索和神经网络的构造方法,但仍是经验性启发式算法,未能提供新的理论洞见。我们研究了LLM引导进化搜索的设计选择,发现针对本问题,将算力分配到采样更多函数比增加每个函数的推理链长度更有效,同时代码与自然语言描述的协同进化会损害搜索质量。我们提出在进化过程中对逻辑等价函数进行去重,这对保持搜索多样性至关重要。研究结果展示了LLM引导进化搜索在信息论与编码设计领域的潜力,标志着此类方法首次应用于纠错码构造。然而在当前框架下,评估函数的计算复杂度随码长呈指数增长,导致该方法仅适用于短码场景。