The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path, Breadth- and Depth-First Search, and Kosaraju's strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.
翻译:近年来,由于经验研究取得的显著进展,利用神经网络执行图算法引起了广泛关注。这促使我们进一步探究神经网络如何通过关系数据进行推理步骤的复现。本工作从理论视角研究了Transformer网络模拟图算法的能力。我们采用的架构是一种配备额外注意力头的循环Transformer,这些注意力头可与图结构进行交互。我们通过构造性证明表明,该架构能够模拟单一算法(如Dijkstra最短路径算法、广度优先搜索、深度优先搜索以及Kosaraju强连通分量算法),并能同时模拟多个算法。网络参数数量不随输入图规模增长,这意味着该网络可对任意图执行上述算法的模拟。尽管具备此特性,我们揭示了有限精度导致的模拟局限性。最后,我们证明了在利用额外注意力头的情况下,该架构能以恒定宽度实现图灵完备性。