Syntax-guided synthesis is commonly used to generate programs encoding policies. In this approach, the set of programs, that can be written in a domain-specific language defines the search space, and an algorithm searches within this space for programs that encode strong policies. In this paper, we propose an alternative method for synthesizing programmatic policies, where we search within an approximation of the language's semantic space. We hypothesized that searching in semantic spaces is more sample-efficient compared to syntax-based spaces. Our rationale is that the search is more efficient if the algorithm evaluates different agent behaviors as it searches through the space, a feature often missing in syntax-based spaces. This is because small changes in the syntax of a program often do not result in different agent behaviors. We define semantic spaces by learning a library of programs that present different agent behaviors. Then, we approximate the semantic space by defining a neighborhood function for local search algorithms, where we replace parts of the current candidate program with programs from the library. We evaluated our hypothesis in a real-time strategy game called MicroRTS. Empirical results support our hypothesis that searching in semantic spaces can be more sample-efficient than searching in syntax-based spaces.
翻译:语法引导综合是一种常用于生成编码策略的程序的通用方法。在该方法中,可由领域特定语言编写的程序集合定义了搜索空间,且算法在此空间内搜索编码强策略的程序。本文提出了一种合成程序化策略的替代方法,即在语言语义空间的近似表征中进行搜索。我们假设在语义空间中搜索比在基于语法的空间中具有更高的样本效率。其核心论据在于:当算法在空间中搜索时,若其能评估不同智能体行为,则搜索效率更高——这正是基于语法的空间常缺失的特性。这是因为程序语法的微小改变通常不会导致智能体行为产生差异。我们通过构建展现不同智能体行为的程序库来定义语义空间。随后,通过为局部搜索算法定义邻域函数来近似语义空间:将当前候选程序的部分片段替换为程序库中的程序。我们在名为MicroRTS的实时策略游戏中验证了该假设。实验结果支持我们的假设:在语义空间中搜索比在基于语法的空间中搜索具有更高的样本效率。