Data movement is the primary bottleneck in modern computing systems. For loop-based programs common in high-performance computing (HPC) and AI workloads, including matrix multiplication, tensor contraction, stencil computation, and einsum operations, the cost of moving data through the memory hierarchy often exceeds the cost of arithmetic. This paper presents AutoLALA, an open-source tool that analyzes data locality in affine loop programs. The tool accepts programs written in a small domain-specific language (DSL), lowers them to polyhedral sets and maps, and produces closed-form symbolic formulas for reuse distance and data movement complexity. AutoLALA implements the fully symbolic locality analysis of Zhu et al. together with the data movement distance (DMD) framework of Smith et al. In particular, it computes reuse distance as the image of the access space under the access map, avoiding both stack simulation and Denning's recursive working-set formulation. We describe the DSL syntax and its formal semantics, the polyhedral lowering pipeline that constructs timestamp spaces and access maps via affine transformations, and the sequence of Barvinok counting operations used to derive symbolic reuse-interval and reuse-distance distributions. The system is implemented in Rust as a modular library spanning three crates, with safe bindings to the Barvinok library. We provide both a command-line interface and an interactive web playground with LaTeX rendering of the output formulas. The tool handles arbitrary affine loop nests, covering workloads such as tensor contractions, einsum expressions, stencil computations, and general polyhedral programs.
翻译:数据移动是现代计算系统的主要瓶颈。对于高性能计算(HPC)和AI工作负载中常见的基于循环的程序,包括矩阵乘法、张量缩并、模板计算和einsum运算,数据在内存层次结构中移动的成本往往超过算术成本。本文提出AutoLALA,一个开源工具,用于分析仿射循环程序的数据局部性。该工具接收使用小型领域特定语言(DSL)编写的程序,将其降级为多面体集合与映射,并生成用于复用距离和数据移动复杂性的闭式符号公式。AutoLALA实现了Zhu等人的全符号局部性分析方法,以及Smith等人的数据移动距离(DMD)框架。特别地,它将复用距离计算为访问映射下访问空间的像,从而避免了栈模拟和Denning的递归工作集公式。我们描述了DSL语法及其形式语义、通过仿射变换构造时间戳空间与访问映射的多面体降级流水线,以及用于推导符号复用间隔与复用距离分布的Barvinok计数操作序列。该系统在Rust中实现为跨越三个crate的模块化库,并包含对Barvinok库的安全绑定。我们提供命令行界面和支持输出公式LaTeX渲染的交互式Web演示环境。该工具可处理任意仿射循环嵌套,涵盖张量缩并、einsum表达式、模板计算以及一般多面体程序等工作负载。