Existing Programming-By-Example (PBE) systems often rely on simplified benchmarks that fail to capture the high structural complexity-such as deeper nesting and frequent Unions-of real-world regexes. To overcome the resulting performance drop, we propose ReSyn, a synthesizer-agnostic divide-and-conquer framework that decomposes complex synthesis problems into manageable sub-problems. We also introduce Set2Regex, a parameter-efficient synthesizer capturing the permutation invariance of examples. Experimental results demonstrate that ReSyn significantly boosts accuracy across various synthesizers, and its combination with Set2Regex establishes a new state-of-the-art on challenging real-world benchmark.
翻译:现有编程示例(Programming-By-Example, PBE)系统通常依赖简化基准测试,未能反映真实世界正则表达式的高结构复杂性——例如更深的嵌套结构与频繁的并集操作。为克服由此导致的性能下降,我们提出ReSyn,一种与合成器无关的分治框架,可将复杂合成问题分解为可管理的子问题。同时引入参数高效型合成器Set2Regex,以捕捉示例的置换不变性。实验结果表明,ReSyn能显著提升多种合成器的准确率,其与Set2Regex的组合在具有挑战性的真实世界基准测试中确立了新的最优技术状态。