Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as language models. In this work we present an algorithm to distill PDFA from neural networks. Our algorithm is a derivative of the L# algorithm and capable of learning PDFA from a new type of query, in which the algorithm infers conditional probabilities from the probability of the queried string to occur. We show its effectiveness on a recent public dataset by distilling PDFA from a set of trained neural networks.
翻译:概率确定性有限自动机(PDFA)是一种对语言条件概率进行建模的离散事件系统:给定已观测到的标记序列,该系统可返回下一个待关注标记出现的概率。此类模型在可解释机器学习领域受到广泛关注,常被用作经语言模型训练的神经网络的替代模型。本研究提出一种从神经网络中蒸馏PDFA的算法。该算法是L#算法的衍生版本,能够通过新型查询方式学习PDFA——算法从查询字符串的出现概率中推导条件概率。我们通过在最新公开数据集上从已训练神经网络集合中蒸馏PDFA,验证了该算法的有效性。