Finite-State Transducers (FSTs) are effective models for string-to-string rewriting tasks, often providing the efficiency necessary for high-performance applications, but constructing transducers by hand is difficult. In this work, we propose a novel method for automatically constructing unweighted FSTs following the hidden state geometry learned by a recurrent neural network. We evaluate our methods on real-world datasets for morphological inflection, grapheme-to-phoneme prediction, and historical normalization, showing that the constructed FSTs are highly accurate and robust for many datasets, substantially outperforming classical transducer learning algorithms by up to 87% accuracy on held-out test sets.
翻译:有限状态转录器(FST)是字符串到字符串重写任务的有效模型,通常为高性能应用提供了必要的效率,但手动构建转录器十分困难。在本工作中,我们提出了一种新颖的方法,可根据循环神经网络学习到的隐藏状态几何结构自动构建未加权FST。我们在真实世界数据集上评估了我们的方法,包括形态屈折变化、字素到音素预测和历史文本规范化任务,结果表明所构建的FST在多数数据集上具有高精度和强鲁棒性,在留出测试集上的准确率最高超越经典转录器学习算法达87%。