In the program analysis and automated bug-fixing fields, it is common to create an abstract interpretation of a program's source code as an Abstract Syntax Tree (AST), which enables programs written in a high-level language to have various static and dynamic analyses applied. However, ASTs suffer from exponential growth in their data size due to the limitation that ASTs will often have identical nodes separately listed in the tree. To address this issue, we introduce a novel code representation schema, Complex Structurally Balanced Abstract Semantic Graph (CSBASG), which represents code as a complex-weighted directed graph that lists a semantic element as a node in the graph and ensures its structural balance for almost finitely enumerable code segments, such as the modeling language Alloy. Our experiment ensures that CSBASG provides a one-on-one correspondence of Alloy predicates to complex-weighted graphs. We evaluate the effectiveness and efficiency of our CSBASG representation for Alloy models and identify future applications of CSBASG for Alloy code generation and automated repair.
翻译:在程序分析与自动缺陷修复领域,通常将程序源代码抽象表示为抽象语法树(Abstract Syntax Tree, AST),从而能够对高级语言编写的程序进行多种静态与动态分析。然而,AST 存在数据规模呈指数级增长的问题,因为树中常会重复列出相同的节点。为解决该问题,我们提出一种新型代码表示模式——复杂结构平衡抽象语义图(Complex Structurally Balanced Abstract Semantic Graph, CSBASG)。该表示将代码建模为带复数权重的有向图,将语义元素作为图中的节点,并确保其对几乎可有限枚举的代码片段(如建模语言 Alloy)保持结构平衡。实验证明,CSBASG 实现了 Alloy 谓词与复数权重图之间的一一对应关系。我们评估了 CSBASG 表示在 Alloy 模型中的有效性与效率,并展望了其在 Alloy 代码生成与自动修复中的未来应用。