Building a high-performance JIT-capable VM for a dynamic language has traditionally required a tremendous amount of time, money, and expertise. We present Deegen, a meta-compiler that allows users to generate a high-performance JIT-capable VM for their own language at an engineering cost similar to writing a simple interpreter. Deegen takes in the execution semantics of the bytecodes implemented as C++ functions, and automatically generates a two-tier VM execution engine with a state-of-the-art interpreter, a state-of-the-art baseline JIT, and the tier-switching logic that connects them into a self-adaptive system. We are the first to demonstrate the automatic generation of a JIT compiler, and the automatic generation of an interpreter that outperforms the state of the art. Our performance comes from a long list of optimizations supported by Deegen, including bytecode specialization and quickening, register pinning, tag register optimization, call inline caching, generic inline caching, JIT polymorphic IC, JIT IC inline slab, type-check removal and strength reduction, type-based slow-path extraction and outlining, JIT hot-cold code splitting, and JIT OSR-entry. These optimizations are either employed automatically, or guided by the language implementer through intuitive APIs. As a result, the disassembly of the Deegen-generated interpreter, baseline JIT, and the generated JIT code rivals the assembly code hand-written by experts in state-of-the-art VMs. We implement LuaJIT Remake (LJR), a standard-compliant Lua 5.1 VM, using Deegen. Across 44 benchmarks, LJR's interpreter is on average 179% faster than the official PUC Lua interpreter, and 31% faster than LuaJIT's interpreter. LJR's baseline JIT has negligible startup delay, and its execution performance is on average 360% faster than PUC Lua and only 33% slower (but faster on 13/44 benchmarks) than LuaJIT's optimizing JIT.
翻译:为动态语言构建高性能的即时编译虚拟机传统上需要耗费大量的时间、资金与专业知识。本文提出Deegen,一种元编译器,使用户能够以近似编写简单解释器的工程成本,为其自有语言生成高性能的即时编译虚拟机。Deegen接收以C++函数实现的字节码执行语义,自动生成一个双层虚拟机执行引擎,包含一个先进的解释器、一个先进的基线即时编译器,以及将它们连接成自适应系统的层级切换逻辑。我们首次实现了即时编译器的自动生成,以及性能超越现有先进水平的解释器的自动生成。其高性能源于Deegen支持的一系列优化技术,包括字节码特化与快速化、寄存器固定、标签寄存器优化、调用内联缓存、通用内联缓存、即时编译多态内联缓存、即时编译内联缓存代码块、类型检查消除与强度削减、基于类型的慢路径提取与外化、即时编译热冷代码分离以及即时编译栈上替换入口。这些优化或由系统自动实施,或由语言实现者通过直观的API进行引导。因此,Deegen生成的解释器、基线即时编译器及其生成的即时编译代码的反汇编结果,可与先进虚拟机中专家手写的汇编代码相媲美。我们使用Deegen实现了LuaJIT重制版,一个符合标准的Lua 5.1虚拟机。在44项基准测试中,LJR的解释器平均比官方PUC Lua解释器快179%,比LuaJIT的解释器快31%。LJR的基线即时编译器具有可忽略的启动延迟,其执行性能平均比PUC Lua快360%,仅比LuaJIT的优化即时编译器慢33%(但在13/44的基准测试中更快)。