We present IsalGraph, a method for representing the structure of any finite, simple graph as a compact string over a nine-character instruction alphabet. The encoding is executed by a small virtual machine comprising a sparse graph, a circular doubly-linked list (CDLL) of graph-node references, and two traversal pointers. Instructions either move a pointer through the CDLL or insert a node or edge into the graph. A key design property is that every string over the alphabet decodes to a valid graph, with no invalid states reachable. A greedy \emph{GraphToString} algorithm encodes any connected graph into a string in time polynomial in the number of nodes; an exhaustive-backtracking variant produces a canonical string by selecting the lexicographically smallest shortest string across all starting nodes and all valid traversal orders. We evaluate the representation on five real-world graph benchmark datasets (IAM Letter LOW/MED/HIGH, LINUX, and AIDS) and show that the Levenshtein distance between IsalGraph strings correlates strongly with graph edit distance (GED). Together, these properties make IsalGraph strings a compact, isomorphism-invariant, and language-model-compatible sequential encoding of graph structure, with direct applications in graph similarity search, graph generation, and graph-conditioned language modelling
翻译:本文提出IsalGraph方法,该方法通过九字符指令字母表上的紧凑字符串表示任意有限简单图的结构。编码由小型虚拟机执行,该虚拟机包含稀疏图、图节点引用的循环双向链表(CDLL)以及两个遍历指针。指令通过CDLL移动指针或在图中插入节点或边。关键设计特性是字母表上的每个字符串均可解码为有效图,且不会到达无效状态。贪婪的GraphToString算法可在节点数多项式时间内将任意连通图编码为字符串;穷举回溯变体通过选择所有起始节点和所有有效遍历顺序中字典序最小的最短字符串,生成规范字符串。我们在五个真实世界图基准数据集(IAM Letter LOW/MED/HIGH、LINUX和AIDS)上评估该表示方法,结果表明IsalGraph字符串间的Levenshtein距离与图编辑距离(GED)高度相关。这些特性共同使IsalGraph字符串成为图结构的紧凑、同构不变且与语言模型兼容的序列编码,可直接应用于图相似性搜索、图生成和图条件语言建模。