Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1 for the Intel $12^{th}$ and $13^{th}$ generations.
翻译:大多数软件领域依赖编译器将高级语言代码翻译为多种不同的机器语言,且性能不至于显著低于开发者耐心用汇编语言直接编写的代码。然而,密码学领域长期属于例外——众多性能关键的例程直接以汇编语言编写(有时通过元编程层实现)。过去已有研究展示了如何对这些汇编代码进行形式化验证,另一些工作则展示了如何自动生成附带形式化证明的C代码,但与已知最优汇编代码相比存在性能损失。我们提出CryptOpt,这是首个将高级密码学函数式程序编译为汇编代码的流水线,其生成代码的速度显著快于GCC或Clang的输出,并附带机械化证明(基于Coq),最终定理陈述几乎不超出输入函数式程序与x86-64汇编操作语义的范畴。在优化层面,我们对汇编程序的解空间实施随机化搜索,并通过目标CPU上的重复自动基准测试进行评估。在形式化验证层面,我们对接Fiat Cryptography框架(该框架将函数式程序转换为类C的IR代码),并为其扩展了新的经过形式化验证的程序等价性检查器,该检查器融合了SMT求解器与符号执行引擎已知功能的适度子集。整个原型系统具备高度实用性:例如,为Curve25519(TLS标准组成部分)和比特币椭圆曲线secp256k1在Intel第12代与第13代处理器上生成了已知最快的有限域算术实现。