Ownership in low-level intermediate representation

The concept of ownership in high level languages can aid both the programmer and the compiler to reason about the validity of memory operations. Previously, ownership semantics has been used successfully in high level automatic program verification to model a reference to data by a first order logic (FOL) representation of data instead of maintaining an address map. However, ownership semantics is not used in low level program verification. We have identified two challenges. First, ownership information is lost when a program is compiled to a low level intermediate representation (e.g., in LLVM IR). Second, pointers in low level programs point to bytes using an address map (e.g., in unsafe Rust) and thus the verification condition (VC) cannot always replace a pointer by its FOL abstraction. To remedy the situation, we develop ownership semantics for an LLVM like low level intermediate representation. Using these semantics, the VC can opportunistically model some memory accesses by a direct access of a pointer cache that stores byte representation of data. This scheme reduces instances where an address map must be maintained, especially for mostly safe programs that follow ownership semantics. For unsafe functionality, memory accesses are modelled by operations on an address map and we provide mechanisms to keep the address map and pointer cache in sync. We implement these semantics in SEABMC, a bit precise bounded model checker for LLVM. For evaluation, the source programs are assumed to be written in C. Since C does not have ownership built in, suitable macros are added that introduce and preserve ownership during translation to LLVM like IR for verification. This approach is evaluated on mature open source C code. For both handcrafted benchmarks and practical programs, we observe a speedup of $1.3x-5x$ during SMT solving.

翻译：高级语言中的所有权概念可协助程序员和编译器推理内存操作的有效性。先前，所有权语义已成功应用于高级自动程序验证中，通过数据的一阶逻辑表示而非维护地址映射来建模对数据的引用。然而，所有权语义在低级程序验证中尚未得到应用。我们识别出两大挑战：首先，当程序被编译至低级中间表示时，所有权信息会丢失；其次，低级程序中的指针通过地址映射指向字节，因此验证条件无法始终用一阶逻辑抽象替代指针。为改善此状况，我们为类LLVM低级中间表示开发了所有权语义。利用这些语义，验证条件可机会性地将部分内存访问建模为直接访问存储数据字节表示的指针缓存。该方案减少了必须维护地址映射的情况，尤其适用于遵循所有权语义的多数安全程序。对于不安全功能，内存访问通过地址映射操作进行建模，我们提供了保持地址映射与指针缓存同步的机制。我们在SEABMC中实现了这些语义。为进行评估，假设源程序用C语言编写。由于C语言未内置所有权，我们添加了合适的宏，在翻译至类LLVM中间表示进行验证时引入并保持所有权。该方法在成熟的C语言开源代码上进行了评估。无论是人工构造的基准测试还是实际程序，在SMT求解阶段均观察到$1.3x-5x$的加速。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日