Formal program verification is a longstanding goal in the field. We present the first quantitative comparison of the two primary compiler verification approaches, credible compilation/translation validation and full verification. Working with the first verified compiler developed by a coding agent (operating under human supervision), we present quantitative results from a coding agent implementing several optimizations using these two approaches. The results indicate that 1) verification requires roughly an order of magnitude more development effort than credible compilation, 2) to enhance provability, the coding agent chooses less efficient algorithms and data structures for verified optimizations, and 3) in an attempt to minimize proof effort the coding agent repeatedly implemented optimization scope reductions for verified optimizations, and 4) certificate checking time dominates optimization and certificate generation time for the considered optimizations. Because of the increased proof overhead, verified optimizations required substantially more supervision and coding sessions than credible compilation optimizations. Given the capabilities of a modern coding agent working in this context, implementation efforts for both credible compilation and verified versions remained feasible for the considered optimizations (unreachable code elimination, dead assignment elimination, and constant propagation/folding).
翻译:形式化程序验证是领域内的长期目标。我们首次对两种主流的编译器验证方法——可信编译/翻译验证与全验证——进行定量比较。以首个由编码智能体(在人类监督下运行)开发的经过验证的编译器为实验对象,我们提供了编码智能体使用这两种方法实现若干优化时的定量结果。结果表明:1)验证所需的开发工作量比可信编译高出约一个数量级;2)为增强可证性,编码智能体为经过验证的优化选择效率更低的算法与数据结构;3)为最小化证明工作量,编码智能体反复缩减已验证优化的优化范围;4)对于所考虑的优化,证书检查时间在优化及证书生成时间中占据主导地位。由于证明开销的增加,与可信编译优化相比,验证优化需要更多的监督与编码会话。鉴于当前编码智能体在此语境下的能力,针对所考虑的优化(不可达代码消除、死赋值消除及常量传播/折叠),可信编译与验证版本的实现工作量均保持在可行范围内。