Analyzing Robustness of Angluin's L$^*$ Algorithm in Presence of Noise

Angluin's L$^*$ algorithm learns the minimal deterministic finite automaton (DFA) of a regular language using membership and equivalence queries. Its probabilistic approximatively correct (PAC) version substitutes an equivalence query by numerous random membership queries to get a high level confidence to the answer. Thus it can be applied to any kind of device and may be viewed as an algorithm for synthesizing an automaton abstracting the behavior of the device based on observations. Here we are interested on how Angluin's PAC learning algorithm behaves for devices which are obtained from a DFA by introducing some noise. More precisely we study whether Angluin's algorithm reduces the noise and produces a DFA closer to the original one than the noisy device. We propose several ways to introduce the noise: (1) the noisy device inverts the classification of words w.r.t. the DFA with a small probability, (2) the noisy device modifies with a small probability the letters of the word before asking its classification w.r.t. the DFA, (3) the noisy device combines the classification of a word w.r.t. the DFA and its classification w.r.t. a counter automaton, and (4) the noisy DFA is obtained by a random process from two DFA such that the language of the first one is included in the second one. Then when a word is accepted (resp. rejected) by the first (resp. second) one, it is also accepted (resp. rejected) and in the remaining cases, it is accepted with probability 0.5. Our main experimental contributions consist in showing that: (1) Angluin's algorithm behaves well whenever the noisy device is produced by a random process, (2) but poorly with a structured noise, and, that (3) is able to eliminate pathological behaviours specified in a regular way. Theoretically, we show that randomness almost surely yields systems with non-recursively enumerable languages.

翻译：Angluin的L$^*$算法通过成员查询和等价查询学习正则语言的最小确定型有限自动机（DFA）。其概率近似正确（PAC）版本用大量随机成员查询替代等价查询，以获得对答案的高置信度。因此该算法适用于任何类型的设备，可视为基于观测结果合成抽象设备行为的自动机算法。本文关注Angluin的PAC学习算法在从DFA引入噪声后获得的设备上的表现。具体而言，我们研究Angluin算法是否能够降低噪声，生成比噪声设备更接近原始DFA的自动机。我们提出多种噪声引入方式：（1）噪声设备以微小概率反转单词相对于DFA的分类；（2）噪声设备以微小概率修改单词中的字母，再查询其相对于DFA的分类；（3）噪声设备结合单词相对于DFA的分类与计数器自动机的分类；（4）通过随机过程从两个DFA获得噪声DFA，其中第一个DFA的语言包含于第二个DFA的语言。当单词被第一个（第二个）DFA接受（拒绝）时，它同样被接受（拒绝），其余情况以0.5概率被接受。我们的主要实验贡献表明：（1）当噪声设备由随机过程产生时，Angluin算法表现良好；（2）但在结构化噪声下表现不佳；（3）该算法能消除以正则方式指定的病态行为。理论上，我们证明随机性几乎必然导致系统具有非递归可枚举语言。