Finite-State Transducers (FSTs) are effective models for string-to-string rewriting tasks, often providing the efficiency necessary for high-performance applications, but constructing transducers by hand is difficult. In this work, we propose a novel method for automatically constructing unweighted FSTs following the hidden state geometry learned by a recurrent neural network. We evaluate our methods on real-world datasets for morphological inflection, grapheme-to-phoneme prediction, and historical normalization, showing that the constructed FSTs are highly accurate and robust for many datasets, substantially outperforming classical transducer learning algorithms by up to 87% accuracy on held-out test sets.
翻译:有限状态转换器(FST)是字符串到字符串重写任务的有效模型,通常为高性能应用提供必要的效率,但手动构建转换器十分困难。在本工作中,我们提出了一种新颖的方法,能够依据循环神经网络学习到的隐状态几何结构,自动构建非加权FST。我们在真实世界数据集上评估了我们的方法,包括形态屈折变化、字形到音素预测以及历史文本规范化任务。结果表明,所构建的FST在多数数据集上具有高准确性和鲁棒性,在留出测试集上的准确率显著超越经典转换器学习算法,最高提升达87%。