Genetic programming (GP) is based on two important insights. First, that any learning task can fundamentally be posed as a program induction problem, where the goal is to construct a symbolic hierarchical model that is expressed as a syntax tree. Second, to pose this task as a search problem, and use evolution to locate the desired model. Since it was proposed, GP has produced notable results in a wide range of tasks and problem domains. This work presents an alternative view by modifying the second core insight of GP, posing the problem as a syntactic derivation task instead. In particular, this paper presents Minimalist Genetic Programming (MGP), an algorithm that like GP is biologically inspired, but instead of evolution it takes inspiration from the Minimalist Program to human language, in which syntax is understood as an optimal solution to the problem of linking two other mental systems. In minimalism, the core computational process is a binary set formation operator called $MERGE$, than can be used to incrementally construct complex syntactic structures using a simple Markovian process. MGP is able to discover the core building blocks of the symbolic expressions, and to incrementally combined them using $MERGE$. The proposed system is benchmarked on symbolic regression tasks that are known to be difficult to solve with standard GP systems because of the propensity for bloat. Results show that when a proper lexicon of atomic syntactic objects are chosen, MGP is able to consistently produce the exact ground truth model on a set of symbolic regression tasks where standard GP struggles to do the same. The insights provided by minimalism are shown to be relevant to the problem of program induction, and should be explored further based on the potential exhibited by MGP in this work.
翻译:遗传编程(Genetic Programming,GP)基于两个重要洞见:其一,任何学习任务本质上均可归结为一个程序归纳问题,其目标是构建以语法树形式表达的符号化层次模型;其二,可将该任务转化为搜索问题,并通过进化过程定位目标模型。自提出以来,GP已在广泛任务和问题领域取得显著成果。本研究通过修改GP的第二核心洞见提出替代性视角,将该问题重构为句法推导任务。具体而言,本文提出最小生成式遗传编程(Minimalist Genetic Programming,MGP)算法——该算法虽与GP同受生物学启发,但摒弃进化机制,转而借鉴语言最简方案(Minimalist Program)对人类语言的理解:语法被视作连接两个其他心智系统的最优解。在最简方案中,核心计算过程是名为MERGE的二元集合形成算子,可通过简单的马尔可夫过程逐步构建复杂句法结构。MGP能够发现符号表达的核心构件,并利用MERGE算子对其进行增量组合。本系统在已知易因膨胀问题而难以用标准GP求解的符号回归任务上进行了基准测试。结果表明,当选择恰当的原子句法对象词库时,MGP能在标准GP难以奏效的多个符号回归任务中稳定生成精确的真实模型。最简方案提供的洞见被证实与程序归纳问题密切相关,基于MGP在本研究中展现的潜力,该方向值得进一步探索。