We introduce IsalProgram (Instruction Set and Language for Programming), a novel assembly-like programming language with three distinctive theoretical properties: (1) it is a regular language in the sense of formal language theory, meaning its programs are accepted by a finite automaton; (2) every finite string over the instruction alphabet is a syntactically valid program; and (3) it makes no explicit use of memory addresses or variable names, absolute or relative. Programs are finite sequences of tokens drawn from a fixed instruction set, and are executed on a virtual machine whose sole data structure is a circular doubly linked list (CDLL) navigated by three data pointers, with control flow governed by two code pointers. We give a complete formal definition of the language and its virtual machine, prove its regularity, and demonstrate its expressive power. We further discuss IsalProgram's potential advantages as a target language for neural program synthesis, the amenability of its program space to metric-based exploration via the Levenshtein edit distance, and directions for analyzing computability and complexity within this framework.
翻译:我们提出IsalProgram(指令集与编程语言),一种新颖的类汇编编程语言,具有三个独特的理论特性:(1) 从形式语言理论角度看,它是一种正则语言,即其程序可被有限自动机接受;(2) 指令字母表上的任意有限字符串均为语法有效的程序;(3) 它不显式使用内存地址或变量名(无论是绝对地址还是相对地址)。程序是由固定指令集提取的标记组成的有限序列,并在虚拟机上执行,该虚拟机的唯一数据结构是由三个数据指针导航的循环双向链表,控制流由两个代码指针管理。我们给出了该语言及其虚拟机的完整形式化定义,证明了其正则性,并展示了其表达能力。我们进一步讨论了IsalProgram作为神经程序合成目标语言的潜在优势、其程序空间可通过莱文斯坦编辑距离进行基于度量的探索的适应性,以及在此框架内分析可计算性和复杂性的研究方向。