Analyzing Robustness of Angluin's L$^*$ Algorithm in Presence of Noise

Angluin's L$^*$ algorithm learns the minimal deterministic finite automaton (DFA) of a regular language using membership and equivalence queries. Its probabilistic approximatively correct (PAC) version substitutes an equivalence query by numerous random membership queries to get a high level confidence to the answer. Thus it can be applied to any kind of device and may be viewed as an algorithm for synthesizing an automaton abstracting the behavior of the device based on observations. Here we are interested on how Angluin's PAC learning algorithm behaves for devices which are obtained from a DFA by introducing some noise. More precisely we study whether Angluin's algorithm reduces the noise and produces a DFA closer to the original one than the noisy device. We propose several ways to introduce the noise: (1) the noisy device inverts the classification of words w.r.t. the DFA with a small probability, (2) the noisy device modifies with a small probability the letters of the word before asking its classification w.r.t. the DFA, (3) the noisy device combines the classification of a word w.r.t. the DFA and its classification w.r.t. a counter automaton, and (4) the noisy DFA is obtained by a random process from two DFA such that the language of the first one is included in the second one. Then when a word is accepted (resp. rejected) by the first (resp. second) one, it is also accepted (resp. rejected) and in the remaining cases, it is accepted with probability 0.5. Our main experimental contributions consist in showing that: (1) Angluin's algorithm behaves well whenever the noisy device is produced by a random process, (2) but poorly with a structured noise, and, that (3) is able to eliminate pathological behaviours specified in a regular way. Theoretically, we show that randomness almost surely yields systems with non-recursively enumerable languages.

翻译：Angluin的L$^*$算法通过成员查询和等价查询学习正则语言的最小确定型有限自动机（DFA）。其概率近似正确（PAC）版本利用大量随机成员查询替代等价查询，以对答案获得高置信度。因此，该算法可应用于任意类型的设备，并可视为一种基于观测结果合成设备行为抽象自动机的算法。本文关注Angluin的PAC学习算法如何对由DFA引入噪声得到的设备表现行为。更精确地，我们研究Angluin算法是否能够降低噪声，并生成比噪声设备更接近原始DFA的自动机。我们提出多种噪声引入方式：（1）噪声设备以较小概率反转相对于DFA的词分类结果；（2）噪声设备在询问词相对于DFA的分类结果前，以较小概率修改词中的字母；（3）噪声设备结合词相对于DFA的分类结果与词相对于计数器自动机的分类结果；（4）噪声DFA通过随机过程由两个DFA生成，其中第一个DFA的语言包含于第二个DFA的语言中。此时，当一个词被第一个DFA接受（或拒绝）、第二个DFA也接受（或拒绝）时，该词同样被接受（或拒绝），其余情况下以0.5概率被接受。我们的主要实验贡献在于表明：（1）当噪声设备由随机过程生成时，Angluin算法表现良好；（2）但对于结构性噪声表现较差；（3）算法能消除以正则方式指定的病态行为。从理论上，我们证明随机性几乎必然产生语言非递归可枚举的系统。