In the program analysis and automated bug-fixing fields, it is common to create an abstract interpretation of a program's source code as an Abstract Syntax Tree (AST), which enables programs written in a high-level language to have various static and dynamic analyses applied. However, ASTs suffer from exponential growth in their data size due to the limitation that ASTs will often have identical nodes separately listed in the tree. To address this issue, we introduce a novel code representation schema, Complex Structurally Balanced Abstract Semantic Graph (CSBASG), which represents code as a complex-weighted directed graph that lists a semantic element as a node in the graph and ensures its structural balance for almost finitely enumerable code segments, such as the modeling language Alloy. Our experiment ensures that CSBASG provides a one-on-one correspondence of Alloy predicates to complex-weighted graphs. We evaluate the effectiveness and efficiency of our CSBASG representation for Alloy models and identify future applications of CSBASG for Alloy code generation and automated repair.
翻译:在程序分析与自动化缺陷修复领域,通常将程序源代码抽象表示为抽象语法树(AST),从而对高级语言编写的程序进行多种静态与动态分析。然而,AST因树中经常存在相同节点被分别列出的局限性,导致其数据规模呈指数级增长。为解决此问题,我们提出一种新型代码表示模式——复杂结构平衡抽象语义图(CSBASG),该模式将代码表示为复加权有向图,将语义元素作为图中的节点,并确保其在几乎有限可枚举的代码片段(如建模语言Alloy)中保持结构平衡。实验证明,CSBASG能够实现Alloy谓词与复加权图的一一对应关系。我们评估了CSBASG表示对Alloy模型的有效性与效率,并确定了未来CSBASG在Alloy代码生成与自动修复中的应用方向。