Synthesizing Formal Semantics from Executable Interpreters

Program verification and synthesis frameworks that allow one to customize the language in which one is interested typically require the user to provide a formally defined semantics for the language. Because writing a formal semantics can be a daunting and error-prone task, this requirement stands in the way of such frameworks being adopted by non-expert users. We present an algorithm that can automatically synthesize inductively defined syntax-directed semantics when given (i) a grammar describing the syntax of a language and (ii) an executable (closed-box) interpreter for computing the semantics of programs in the language of the grammar. Our algorithm synthesizes the semantics in the form of Constrained-Horn Clauses (CHCs), a natural, extensible, and formal logical framework for specifying inductively defined relations that has recently received widespread adoption in program verification and synthesis. The key innovation of our synthesis algorithm is a Counterexample-Guided Synthesis (CEGIS) approach that breaks the hard problem of synthesizing a set of constrained Horn clauses into small, tractable expression-synthesis problems that can be dispatched to existing SyGuS synthesizers. Our tool Synantic synthesized inductively-defined formal semantics from 14 interpreters for languages used in program-synthesis applications. When synthesizing formal semantics for one of our benchmarks, Synantic unveiled an inconsistency in the semantics computed by the interpreter for a language of regular expressions; fixing the inconsistency resulted in a more efficient semantics and, for some cases, in a 1.2x speedup for a synthesizer solving synthesis problems over such a language.

翻译：允许用户自定义目标语言的程序验证与合成框架通常要求用户提供该语言的形式化语义定义。由于编写形式语义是一项艰巨且易出错的任务，这一要求阻碍了非专家用户对此类框架的采用。本文提出一种算法，能够在给定（i）描述语言语法的文法与（ii）用于计算该文法语言中程序语义的可执行（黑盒）解释器时，自动合成归纳定义的语法导向语义。我们的算法以约束霍恩子句（CHCs）的形式合成语义——这是一种自然、可扩展且形式化的逻辑框架，用于指定归纳定义的关系，近年来已在程序验证与合成领域得到广泛采用。我们合成算法的核心创新在于采用反例引导合成（CEGIS）方法，将合成一组约束霍恩子句的难题分解为若干小型、可处理的表达式合成子问题，这些子问题可交由现有的SyGuS合成器处理。我们的工具Synantic从14个用于程序合成应用的语言解释器中成功合成了归纳定义的形式语义。在为其中一个基准测试合成形式语义时，Synantic揭示了正则表达式语言解释器在计算语义时存在的不一致性问题；修复该不一致性后，不仅获得了更高效的语义定义，在某些情况下还使基于该语言的合成器求解合成问题的速度提升了1.2倍。