Uproot can read ROOT files directly in pure Python but cannot (yet) compute expressions in ROOT's TTreeFormula expression language. Despite its popularity, this language has only one implementation and no formal specification. In a package called "formulate," we defined the language's syntax in standard BNF and parse it with Lark, a fast and modern parsing toolkit in Python. With formulate, users can now convert ROOT TTreeFormula expressions into NumExpr and Awkward Array manipulations. In this contribution, we describe BNF notation and the Look Ahead Left to Right (LALR) parsing algorithm, which scales linearly with expression length. We also present the challenges with interpreting TTreeFormula expressions as a functional language; some function-like forms can't be expressed as true functions. We also describe the design of the abstract syntax tree that facilitates conversion between the three languages. The formulate package has zero package dependencies, so we are adding it as one of Uproot's dependencies so that Uproot will be able to use TTreeFormula expressions, whether they are hand-written or embedded in a ROOT file as TTree aliases.
翻译:Uproot可以直接在纯Python中读取ROOT文件,但尚无法计算ROOT的TTreeFormula表达式语言中的表达式。尽管该语言应用广泛,但其仅有一种实现且缺乏正式规范。我们在名为"formulate"的包中,使用标准BNF定义了该语言的语法,并通过Python中快速现代的解析工具包Lark进行解析。借助formulate,用户现在可以将ROOT TTreeFormula表达式转换为NumExpr和Awkward Array操作。本文介绍了BNF表示法和向前看左到右(LALR)解析算法,该算法的时间复杂度随表达式长度线性增长。我们还讨论了将TTreeFormula表达式解释为函数式语言所面临的挑战:某些类函数形式无法表示为真正的函数。同时,我们描述了促进三种语言间转换的抽象语法树设计。formulate包具有零依赖特性,因此我们将其添加为Uproot的依赖项,使Uproot能够处理TTreeFormula表达式——无论是手动编写的表达式,还是作为TTree别名嵌入ROOT文件中的表达式。