Inductive programming frequently relies on some form of search in order to identify candidate solutions. However, the size of the search space limits the use of inductive programming to the production of relatively small programs. If we could somehow correctly predict the subset of instructions required for a given problem then inductive programming would be more tractable. We will show that this can be achieved in a high percentage of cases. This paper presents a novel model of programming language instruction co-occurrence that was built to support search space partitioning in the Zoea distributed inductive programming system. This consists of a collection of intersecting instruction subsets derived from a large sample of open source code. Using the approach different parts of the search space can be explored in parallel. The number of subsets required does not grow linearly with the quantity of code used to produce them and a manageable number of subsets is sufficient to cover a high percentage of unseen code. This approach also significantly reduces the overall size of the search space - often by many orders of magnitude.
翻译:归纳编程通常依赖某种形式的搜索来识别候选解决方案。然而,搜索空间的规模限制了归纳编程只能用于生成相对较小的程序。如果我们能针对特定问题正确预测所需的指令子集,归纳编程将更易处理。我们将证明,在大多数情况下这一点可以实现。本文提出了一种新颖的编程语言指令共现模型,该模型旨在支持Zoea分布式归纳编程系统中的搜索空间划分。该模型由从大量开源代码样本中提取的若干相交指令子集构成。通过该方法,搜索空间的不同部分可以并行探索。所需子集数量不会随用于生成子集的代码量线性增长,且可控数量的子集足以覆盖绝大多数未见代码。该方法还显著缩减了搜索空间的整体规模——通常能达到多个数量级的缩减。