C-based interpreters such as CPython make extensive use of C "extension" code, which is opaque to static analysis tools and faster runtimes with JIT compilers, such as PyPy. Not only are the extensions opaque, but the interface between the dynamic language types and the C types can introduce impedance. We hypothesise that frequent calls to C extension code introduce significant overhead that is often unnecessary. We validate this hypothesis by introducing a simple technique, "typed methods", which allow selected C extension functions to have additional metadata attached to them in a backward-compatible way. This additional metadata makes it much easier for a JIT compiler (and as we show, even an interpreter!) to significantly reduce the call and return overhead. Although we have prototyped typed methods in PyPy, we suspect that the same technique is applicable to a wider variety of language runtimes and that the information can also be consumed by static analysis tooling.
翻译:基于C的解释器(如CPython)广泛使用C“扩展”代码,这类代码对静态分析工具及具备JIT编译器的快速运行时(如PyPy)而言是不透明的。扩展代码不仅不透明,动态语言类型与C类型之间的接口还会引入阻抗不匹配问题。我们假设频繁调用C扩展代码会带来显著且通常不必要的开销。为验证该假设,我们提出一种简单技术——“类型化方法”(typed methods),允许选定的C扩展函数以向后兼容的方式附加额外元数据。这些元数据使JIT编译器(如本文所示,甚至解释器!)能大幅降低调用与返回开销。尽管我们在PyPy中对类型化方法进行了原型实现,但推测该技术同样适用于更广泛的语言运行时,且其信息也可被静态分析工具所利用。