Stack graphs: Name resolution at scale

We present stack graphs, an extension of Visser et al.'s scope graphs framework. Stack graphs power Precise Code Navigation at GitHub, allowing users to navigate name binding references both within and across repositories. Like scope graphs, stack graphs encode the name binding information about a program in a graph structure, in which paths represent valid name bindings. Resolving a reference to its definition is then implemented with a simple path-finding search. GitHub hosts millions of repositories, containing petabytes of total code, implemented in hundreds of different programming languages, and receiving thousands of pushes per minute. To support this scale, we ensure that the graph construction and path-finding judgments are file-incremental: for each source file, we create an isolated subgraph without any knowledge of, or visibility into, any other file in the program. This lets us eliminate the storage and compute costs of reanalyzing file versions that we have already seen. Since most commits change a small fraction of the files in a repository, this greatly amortizes the operational costs of indexing large, frequently changed repositories over time. To handle type-directed name lookups (which require "pausing" the current lookup to resolve another name), our name resolution algorithm maintains a stack of the currently paused (but still pending) lookups. Stack graphs can be constructed via a purely syntactic analysis of the program's source code, using a new declarative graph construction language. This means that we can extract name binding information for every repository without any per-package configuration, and without having to invoke an arbitrary, untrusted, package-specific build process.

翻译：我们提出了堆栈图（stack graphs），这是对 Visser 等人提出的作用域图框架的扩展。堆栈图为 GitHub 上的精确代码导航提供支持，使用户能够跨仓库及仓库内部分辨名称绑定引用。与作用域图类似，堆栈图将程序的名称绑定信息编码为图结构，其中路径表示有效的名称绑定。解析引用到其定义的过程通过简单的路径查找搜索实现。GitHub 托管了数百万个仓库，包含总计数拍字节的代码，涵盖数百种编程语言，每分钟接收数千次推送。为支撑这一规模，我们确保图构建和路径查找判定是文件增量的：对于每个源文件，我们创建一个独立的子图，无需了解或访问程序中任何其他文件的信息。这使我们能够消除已见过的文件版本重新分析所需的存储和计算成本。由于大多数提交仅更改仓库中的少量文件，这极大地摊销了大型频繁变更仓库的索引运维成本。为了处理类型导向的名称查找（这需要“暂停”当前查找以解析另一个名称），我们的名称解析算法维护一个当前已暂停但仍待处理的查找堆栈。堆栈图可通过程序源码的纯语法分析构建，使用一种新的声明式图构建语言。这意味着我们能够在不进行任何按包配置、也不需调用任意的、不受信任的、包特定的构建过程的情况下，为每个仓库提取名称绑定信息。