This paper presents the main features of a system that aims to transform regular expressions into shorter equivalent expressions. The system is also capable of computing other operations useful for simplification, such as checking the inclusion of regular languages. The main novelty of this work is that it combines known but distinct ways of representing regular languages into a global unified data structure that makes the operations more efficient. In addition, representations of regular languages are dynamically reduced as operations are performed on them. Expressions are normalized and represented by a unique identifier (an integer). Expressions found to be equivalent (i.e. denoting the same regular language) are grouped into equivalence classes from which a shortest representative is chosen. The article briefly describes the main algorithms working on the global data structure. Some of them are direct adaptations of well-known algorithms, but most of them incorporate new ideas, which are really necessary to make the system efficient. Finally, to show its usefulness, the system is applied to some examples from the literature. Statistics on randomly generated sets of expressions are also provided.
翻译:本文介绍了一个旨在将正则表达式转换为更短等价表达式的系统的主要特性。该系统还能够计算其他有助于简化的操作,例如检查正则语言的包含关系。本工作的主要创新在于,它将已知但不同的正则语言表示方式整合到一个全局统一数据结构中,从而提高了操作效率。此外,正则语言的表示方式会在执行操作时动态缩减。表达式被规范化,并通过唯一标识符(一个整数)表示。被认定为等价(即表示相同正则语言)的表达式被归入等价类,并从每个等价类中选出最短的代表。文章简要描述了在该全局数据结构上运行的主要算法。其中一些算法是直接改编自知名算法,但大多数算法融入了新思想,这些思想对于实现系统的高效性至关重要。最后,为展示其用途,该系统被应用于文献中的若干示例,并提供了随机生成表达式集合的统计数据。