Grammatical inference consists in learning a formal grammar as a finite state machine or as a set of rewrite rules. In this paper, we are concerned with inferring Nondeterministic Finite Automata (NFA) that must accept some words, and reject some other words from a given sample. This problem can naturally be modeled in SAT. The standard model being enormous, some models based on prefixes, suffixes, and hybrids were designed to generate smaller SAT instances. There is a very simple and obvious property that says: if there is an NFA of size k for a given sample, there is also an NFA of size k+1. We first strengthen this property by adding some characteristics to the NFA of size k+1. Hence, we can use this property to tighten the bounds of the size of the minimal NFA for a given sample. We then propose simplified and refined models for NFA of size k+1 that are smaller than the initial models for NFA of size k. We also propose a reduction algorithm to build an NFA of size k from a specific NFA of size k+1. Finally, we validate our proposition with some experimentation that shows the efficiency of our approach.
翻译:语法推断旨在学习一种形式文法,表现为有限状态机或重写规则集。本文关注推断非确定型有限自动机(NFA)的问题,要求其必须接受给定的正样例词,并拒绝给定的反样例词。该问题可自然地建模为SAT问题。由于标准模型规模庞大,学者们设计了基于前缀、后缀及混合方法的模型以生成更小的SAT实例。存在一个简单且显然的性质:若针对给定样本存在规模为k的NFA,则必然存在规模为k+1的NFA。我们首先通过向规模为k+1的NFA添加若干特征来强化该性质,从而可将其用于收紧给定样本最小NFA规模的界限。进而,我们为规模k+1的NFA提出简化精炼模型,使其比初始的规模k模型更加精简。同时提出一种归约算法,能够从特定的规模k+1的NFA构建出规模k的NFA。最后,通过实验验证了所提方法的有效性。