One of the main challenges of N-Version Programming is development cost: it requires paying multiple teams to develop variants of the same system. To address this issue, we propose the automated generation of variants using large language models. We design, develop and evaluate Gal\'apagos: a tool for generating program variants using LLMs, validating their correctness and equivalence, and using them to assemble N-Version binaries. We evaluate Gal\'apagos by creating N-Version components of real-world C code. Our original results show that Gal\'apagos can produce program variants that are proven to be functionally equivalent, even when the variants are written in a different programming language. Our systematic diversity measurement indicate that functionally equivalent variants produced by Gal\'apagos, are statically different after compilation, and present diverging internal behavior at runtime. We demonstrate that the variants produced by Gal\'apagos can protect C code against real miscompilation bugs which affect the Clang compiler. Overall, our paper shows that producing N-Version software can be drastically automated by advanced usage of practical formal verification and generative language models.
翻译:N版本编程面临的主要挑战之一是开发成本:它需要支付多个团队来开发同一系统的不同变体。为解决这一问题,我们提出利用大型语言模型自动生成程序变体。我们设计、开发并评估了Galápagos:这是一个利用LLMs生成程序变体、验证其正确性与等价性,并基于这些变体组装N版本二进制文件的工具。我们通过为实际C语言代码创建N版本组件来评估Galápagos。原创性实验结果表明,Galápagos能够生成经形式化验证功能等价的程序变体,即使这些变体采用不同的编程语言编写。系统性多样性度量表明,Galápagos生成的功能等价变体在编译后具有静态差异,并在运行时表现出不同的内部行为。我们证明Galápagos生成的变体能够保护C代码免受影响Clang编译器的实际错误编译漏洞的侵害。总体而言,本文表明通过先进的形式化验证技术与生成式语言模型的实践应用,N版本软件的生产过程可实现高度自动化。