Analyzing Robustness of Angluin's L$^*$ Algorithm in Presence of Noise

Angluin's L$^*$ algorithm learns the minimal deterministic finite automaton (DFA) of a regular language using membership and equivalence queries. Its probabilistic approximatively correct (PAC) version substitutes an equivalence query by numerous random membership queries to get a high level confidence to the answer. Thus it can be applied to any kind of device and may be viewed as an algorithm for synthesizing an automaton abstracting the behavior of the device based on observations. Here we are interested on how Angluin's PAC learning algorithm behaves for devices which are obtained from a DFA by introducing some noise. More precisely we study whether Angluin's algorithm reduces the noise and produces a DFA closer to the original one than the noisy device. We propose several ways to introduce the noise: (1) the noisy device inverts the classification of words w.r.t. the DFA with a small probability, (2) the noisy device modifies with a small probability the letters of the word before asking its classification w.r.t. the DFA, (3) the noisy device combines the classification of a word w.r.t. the DFA and its classification w.r.t. a counter automaton, and (4) the noisy DFA is obtained by a random process from two DFA such that the language of the first one is included in the second one. Then when a word is accepted (resp. rejected) by the first (resp. second) one, it is also accepted (resp. rejected) and in the remaining cases, it is accepted with probability 0.5. Our main experimental contributions consist in showing that: (1) Angluin's algorithm behaves well whenever the noisy device is produced by a random process, (2) but poorly with a structured noise, and, that (3) is able to eliminate pathological behaviours specified in a regular way. Theoretically, we show that randomness almost surely yields systems with non-recursively enumerable languages.

翻译：Angluin的L$^*$算法通过成员查询和等价查询学习正则语言的最小确定有限自动机（DFA）。其概率近似正确（PAC）版本通过大量随机成员查询替代等价查询，从而获得对答案的高置信度。因此该算法可应用于任意类型的设备，并被视为一种基于观测合成自动机以抽象设备行为的算法。本文关注Angluin PAC学习算法在由DFA引入噪声的设备上的表现行为。具体而言，我们研究Angluin算法是否能够降低噪声，并生成比噪声设备更接近原始DFA的DFA。我们提出了多种引入噪声的方式：（1）噪声设备以小概率反转单词相对于DFA的分类结果，（2）噪声设备在查询单词相对于DFA的分类结果前，以小概率修改单词中的字符，（3）噪声设备结合单词相对于DFA的分类结果及其相对于计数器自动机的分类结果，（4）噪声DFA通过随机过程从两个DFA（第一个DFA的语言包含于第二个DFA的语言）生成，当单词被第一个DFA接受（或被第二个DFA拒绝）时保持原分类，其余情况以0.5概率接受。我们的主要实验贡献表明：（1）当噪声设备由随机过程生成时，Angluin算法表现良好；（2）但在结构化噪声下性能较差；（3）该算法能消除以正则方式指定的病态行为。理论上，我们证明随机性几乎必然导致语言非递归可枚举的系统。