In the program analysis and automated bug-fixing fields, it is common to create an abstract interpretation of a program's source code as an Abstract Syntax Tree (AST), which enables programs written in a high-level language to have various static and dynamic analyses applied. However, ASTs suffer from exponential growth in their data size due to the limitation that ASTs will often have identical nodes separately listed in the tree. To address this issue, we introduce a novel code representation schema, Complex Structurally Balanced Abstract Semantic Graph (CSBASG), which represents code as a complex-weighted directed graph that lists a semantic element as a node in the graph and ensures its structural balance for almost finitely enumerable code segments, such as the modeling language Alloy. Our experiment ensures that CSBASG provides a one-on-one correspondence of Alloy predicates to complex-weighted graphs. We evaluate the effectiveness and efficiency of our CSBASG representation for Alloy models and identify future applications of CSBASG for Alloy code generation and automated repair.
翻译:在程序分析与自动化缺陷修复领域,通常将程序源代码抽象解释为抽象语法树(AST),使高级语言编写的程序能够应用多种静态与动态分析。然而,AST因常将相同节点在树中分开列出而导致数据规模呈指数级增长。为解决该问题,我们提出一种新型代码表示范式——复杂结构平衡抽象语义图(CSBASG),该范式将代码表示为复杂加权有向图,将语义元素作为图中节点,并确保其在几乎有限可枚举代码段(如建模语言Alloy)中保持结构平衡。实验证明,CSBASG实现了Alloy谓词与复杂加权图的一一对应。我们评估了CSBASG表示在Alloy模型中的有效性与效率,并指出了CSBASG在Alloy代码生成与自动修复中的未来应用前景。