The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path, Breadth- and Depth-First Search, and Kosaraju's strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.
翻译:近年来,利用神经网络执行图算法因展现出具有前景的实证进展而受到广泛关注。这促使我们进一步探究神经网络如何通过关系数据进行推理步骤的复现。本工作中,我们从理论角度研究了Transformer网络模拟图算法的能力。我们采用的架构是一种配备额外注意力头的循环Transformer,这些注意力头与图结构进行交互。通过构造性证明,我们表明该架构能够模拟单一算法(如Dijkstra最短路径算法、广度优先搜索、深度优先搜索及Kosaraju强连通分量算法),并能同时模拟多个算法。网络参数数量不随输入图规模增长,这意味着该网络可对任意图结构执行上述算法的模拟。尽管具备此特性,我们揭示了有限精度导致的模拟局限性。最后,我们证明了在利用额外注意力头的情况下,恒定宽度的网络架构具有图灵完备性。