We propose a new language feature for ML-family languages, the ability to selectively unbox certain data constructors, so that their runtime representation gets compiled away to just the identity on their argument. Unboxing must be statically rejected when it could introduce confusions, that is, distinct values with the same representation. We discuss the use-case of big numbers, where unboxing allows to write code that is both efficient and safe, replacing either a safe but slow version or a fast but unsafe version. We explain the static analysis necessary to reject incorrect unboxing requests. We present our prototype implementation of this feature for the OCaml programming language, discuss several design choices and the interaction with advanced features such as Guarded Algebraic Datatypes. Our static analysis requires expanding type definitions in type expressions, which is not necessarily normalizing in presence of recursive type definitions. In other words, we must decide normalization of terms in the first-order lambda-calculus with recursion. We provide an algorithm to detect non-termination on-the-fly during reduction, with proofs of correctness and completeness. Our termination-monitoring algorithm turns out to be closely related to the normalization strategy for macro expansion in the `cpp` preprocessor.
翻译:我们提出ML语言家族的一个新语言特性:选择性解箱特定数据构造子的能力,使其运行时表示被编译为仅保留其参数的恒等映射。当解箱可能引入混淆(即不同值具有相同表示)时,必须在静态阶段拒绝解箱操作。我们讨论大数值的使用场景,其中解箱允许编写既高效又安全的代码,取代要么安全但缓慢、要么快速但不安全的版本。我们解释了拒绝错误解箱请求所需的静态分析,并给出了该特性在OCaml编程语言中的原型实现,讨论了多种设计选择及其与防卫代数数据类型等高级特性的交互。我们的静态分析需要在类型表达式中展开类型定义,这在存在递归类型定义时未必能归一化。换言之,我们必须判定带递归的一阶λ演算中项的归一化问题。我们提供了一种在规约过程中实时检测非终止的算法,并附带正确性与完备性证明。该终止监控算法最终被发现与cpp预处理器中宏展开的归一化策略密切相关。