Can we take a recurrent neural network (RNN) trained to translate between languages and augment it to support a new natural language without retraining the model from scratch? Can we fix the faulty behavior of the RNN by replacing portions associated with the faulty behavior? Recent works on decomposing a fully connected neural network (FCNN) and convolutional neural network (CNN) into modules have shown the value of engineering deep models in this manner, which is standard in traditional SE but foreign for deep learning models. However, prior works focus on the image-based multiclass classification problems and cannot be applied to RNN due to (a) different layer structures, (b) loop structures, (c) different types of input-output architectures, and (d) usage of both nonlinear and logistic activation functions. In this work, we propose the first approach to decompose an RNN into modules. We study different types of RNNs, i.e., Vanilla, LSTM, and GRU. Further, we show how such RNN modules can be reused and replaced in various scenarios. We evaluate our approach against 5 canonical datasets (i.e., Math QA, Brown Corpus, Wiki-toxicity, Clinc OOS, and Tatoeba) and 4 model variants for each dataset. We found that decomposing a trained model has a small cost (Accuracy: -0.6%, BLEU score: +0.10%). Also, the decomposed modules can be reused and replaced without needing to retrain.
翻译:我们能否将一个训练好的、用于语言间翻译的循环神经网络(RNN)进行扩展,使其支持一种新的自然语言,而无需从头重新训练模型?我们能否通过替换与错误行为相关的部分来修复RNN的故障行为?近期关于将全连接神经网络(FCNN)和卷积神经网络(CNN)分解为模块的研究,展示了以这种方式工程化深度模型的价值——这在传统软件工程中是标准做法,但对深度学习模型而言却属全新领域。然而,以往工作聚焦于基于图像的多分类问题,且因以下原因无法应用于RNN:(a)不同的层级结构,(b)循环结构,(c)不同类型的输入-输出架构,以及(d)同时使用非线性激活函数和逻辑激活函数。在本工作中,我们提出了首个将RNN分解为模块的方法。我们研究了不同类型的RNN,即Vanilla、LSTM和GRU。此外,我们展示了这种RNN模块如何在多种场景中被复用和替换。我们使用5个经典数据集(即Math QA、Brown Corpus、Wiki-toxicity、Clinc OOS和Tatoeba)及每个数据集的4种模型变体来评估我们的方法。我们发现,分解一个训练好的模型代价很小(准确率:-0.6%,BLEU分数:+0.10%)。此外,分解后的模块无需重新训练即可被复用和替换。