Overlapping instruction subsets derived from human originated code have previously been shown to dramatically shrink the inductive programming search space, often by many orders of magnitude. Here we extend the instruction subset approach to consider direct instruction-instruction applications (or instruction digrams) as an additional search heuristic for inductive programming. In this study we analyse the frequency distribution of instruction digrams in a large sample of open source code. This indicates that the instruction digram distribution is highly skewed with over 93% of possible instruction digrams not represnted in the code sample. We demonstrate that instruction digrams can be used to constrain instruction selection during search, further reducing size of the the search space, in some cases by several orders of magnitude. This significantly increases the size of programs that can be generated using search based inductive programming techniques. We discuss the results and provide some suggestions for further work.
翻译:先前的研究表明,源于人类代码的重叠指令子集可将归纳编程的搜索空间大幅缩减多个数量级。本文提出将指令子集方法扩展至指令-指令连续组合(即指令二元组合)作为归纳编程的额外搜索启发式。本研究通过分析大规模开源代码样本中指令二元组合的频率分布,发现其分布呈现高度偏斜特性:超过93%的潜在指令二元组合未在代码样本中出现。实验证明,指令二元组合可用于约束搜索过程中的指令选择,使搜索空间规模进一步缩减(部分案例达数个数量级),从而显著扩展基于搜索的归纳编程技术所能生成程序的规模上限。最后对研究结果进行讨论并提出后续工作建议。