Recurrent Neural Networks (RNNs) have achieved tremendous success in processing sequential data, yet understanding and analyzing their behaviours remains a significant challenge. To this end, many efforts have been made to extract finite automata from RNNs, which are more amenable for analysis and explanation. However, existing approaches like exact learning and compositional approaches for model extraction have limitations in either scalability or precision. In this paper, we propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks. First, to address the transition sparsity and context loss problems we identified in WFA extraction for natural language tasks, we propose an empirical method to complement missing rules in the transition diagram, and adjust transition matrices to enhance the context-awareness of the WFA. We also propose two data augmentation tactics to track more dynamic behaviours of RNN, which further allows us to improve the extraction precision. Based on the extracted model, we propose an explanation method for RNNs including a word embedding method -- Transition Matrix Embeddings (TME) and TME-based task oriented explanation for the target RNN. Our evaluation demonstrates the advantage of our method in extraction precision than existing approaches, and the effectiveness of TME-based explanation method in applications to pretraining and adversarial example generation.
翻译:递归神经网络在处理序列数据方面取得了巨大成功,但理解和分析其行为仍是一个重大挑战。为此,许多研究致力于从递归神经网络中提取有限自动机,因为后者更便于分析和解释。然而,现有的精确学习方法和组合式模型提取方法在可扩展性或精度方面存在局限性。本文提出了一种新的加权有限自动机提取与解释框架,以解决自然语言任务中的这些局限性。首先,针对我们在自然语言任务的加权有限自动机提取中发现的转移稀疏性和上下文丢失问题,提出了一种经验方法来补充转移图中的缺失规则,并调整转移矩阵以增强加权有限自动机的上下文感知能力。我们还提出了两种数据增强策略,用于追踪递归神经网络更动态的行为,从而进一步提高提取精度。基于提取的模型,我们提出了一种递归神经网络解释方法,包括词嵌入方法——转移矩阵嵌入,以及基于转移矩阵嵌入的面向目标的任务解释。实验评估表明,我们的方法在提取精度上优于现有方法,并且基于转移矩阵嵌入的解释方法在预训练和对抗样本生成等应用中具有有效性。