Stack graphs: Name resolution at scale

We present stack graphs, an extension of Visser et al.'s scope graphs framework. Stack graphs power Precise Code Navigation at GitHub, allowing users to navigate name binding references both within and across repositories. Like scope graphs, stack graphs encode the name binding information about a program in a graph structure, in which paths represent valid name bindings. Resolving a reference to its definition is then implemented with a simple path-finding search. GitHub hosts millions of repositories, containing petabytes of total code, implemented in hundreds of different programming languages, and receiving thousands of pushes per minute. To support this scale, we ensure that the graph construction and path-finding judgments are file-incremental: for each source file, we create an isolated subgraph without any knowledge of, or visibility into, any other file in the program. This lets us eliminate the storage and compute costs of reanalyzing file versions that we have already seen. Since most commits change a small fraction of the files in a repository, this greatly amortizes the operational costs of indexing large, frequently changed repositories over time. To handle type-directed name lookups (which require "pausing" the current lookup to resolve another name), our name resolution algorithm maintains a stack of the currently paused (but still pending) lookups. Stack graphs can be constructed via a purely syntactic analysis of the program's source code, using a new declarative graph construction language. This means that we can extract name binding information for every repository without any per-package configuration, and without having to invoke an arbitrary, untrusted, package-specific build process.

翻译：我们提出Stack图，这是对Visser等人作用域图框架的扩展。Stack图驱动了GitHub上的精准代码导航功能，使用户能够跨仓库及仓库内解析名称绑定引用。与作用域图类似，Stack图将程序中的名称绑定信息编码为图结构，其中路径表示有效的名称绑定关系。解析引用的定义通过简单的路径搜索实现。GitHub托管着数百万个仓库，包含PB级代码总量，涉及数百种编程语言，每分钟处理数千次推送。为支持此规模，我们确保图构建与路径搜索判定是文件增量的：每个源文件创建独立子图，无需感知或访问程序中的其他文件。这使我们能够消除已分析文件版本的存储与计算开销。由于大多数提交仅修改仓库中的少量文件，这种方法可大幅摊销频繁变动的大型仓库的索引运维成本。为处理类型导向的名称查找（需"暂停"当前查找以解析其他名称），我们的名称解析算法维护了当前已暂停但待处理的查找栈。Stack图可通过纯语法分析构建，采用新型声明式图构建语言，这意味着我们无需逐包配置即可提取每个仓库的名称绑定信息，也无需调用任意的、不可信的、包特定的构建流程。