Building a high-performance JIT-capable VM for a dynamic language has traditionally required a tremendous amount of time, money, and expertise. We present Deegen, a meta-compiler that allows users to generate a high-performance JIT-capable VM for their own language at an engineering cost similar to writing a simple interpreter. Deegen takes in the execution semantics of the bytecodes implemented as C++ functions, and automatically generates a two-tier VM execution engine with a state-of-the-art interpreter, a state-of-the-art baseline JIT, and the tier-switching logic that connects them into a self-adaptive system. We are the first to demonstrate the automatic generation of a JIT compiler, and the automatic generation of an interpreter that outperforms the state of the art. Our performance comes from a long list of optimizations supported by Deegen, including bytecode specialization and quickening, register pinning, tag register optimization, call inline caching, generic inline caching, JIT polymorphic IC, JIT IC inline slab, type-check removal and strength reduction, type-based slow-path extraction and outlining, JIT hot-cold code splitting, and JIT OSR-entry. These optimizations are either employed automatically, or guided by the language implementer through intuitive APIs. As a result, the disassembly of the Deegen-generated interpreter, baseline JIT, and the generated JIT code rivals the assembly code hand-written by experts in state-of-the-art VMs. We implement LuaJIT Remake (LJR), a standard-compliant Lua 5.1 VM, using Deegen. Across 44 benchmarks, LJR's interpreter is on average 179% faster than the official PUC Lua interpreter, and 31% faster than LuaJIT's interpreter. LJR's baseline JIT has negligible startup delay, and its execution performance is on average 360% faster than PUC Lua and only 33% slower (but faster on 13/44 benchmarks) than LuaJIT's optimizing JIT.
翻译:为动态语言构建高性能的即时编译虚拟机传统上需要耗费大量的时间、资金与专业知识。我们提出Deegen,一种元编译器,它允许用户以类似于编写简单解释器的工程成本,为其自有语言生成高性能的即时编译虚拟机。Deegen接收以C++函数实现的字节码执行语义,并自动生成一个双层虚拟机执行引擎,该引擎包含一个先进的解释器、一个先进的基线即时编译器以及将它们连接成一个自适应系统的层级切换逻辑。我们是首个展示自动生成即时编译器以及自动生成性能超越现有先进水平的解释器的研究。我们的性能优势源于Deegen支持的一系列优化技术,包括字节码特化与快速化、寄存器固定、标签寄存器优化、调用内联缓存、通用内联缓存、即时编译多态内联缓存、即时编译内联缓存板、类型检查移除与强度削减、基于类型的慢路径提取与外化、即时编译热冷代码分离以及即时编译栈上替换入口。这些优化或由系统自动应用,或由语言实现者通过直观的API进行引导。因此,Deegen生成的解释器、基线即时编译器及生成的即时编译代码的反汇编结果,可与先进虚拟机中专家手写的汇编代码相媲美。我们使用Deegen实现了LuaJIT Remake,一个符合标准的Lua 5.1虚拟机。在44项基准测试中,LJR的解释器平均比官方PUC Lua解释器快179%,比LuaJIT的解释器快31%。LJR的基线即时编译器具有可忽略的启动延迟,其执行性能平均比PUC Lua快360%,仅比LuaJIT的优化即时编译器慢33%(但在44项测试中的13项上更快)。