We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration statistics. We implement a GPU-based forward-backward algorithm that enables the simultaneous training of label and transition probabilities. We investigate recognition results and additionally Viterbi alignments of our models. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality. The generated alignments are shown to be viable targets in state-of-the-art Viterbi trainings.
翻译:我们研究了一种新颖的端到端神经网络训练建模方法,该方法采用隐马尔可夫模型(HMM)显式建模并学习隐状态间的转移概率。多数当代序列到序列模型通过遍历给定拓扑结构中的所有可能标签分割实现从头训练,而我们的方法则采用可学习的显式段间转移概率,替代了隐式编码时长信息的空白标签。我们实现了基于GPU的前向-后向算法,能够同步训练标签概率与转移概率。我们不仅评估了模型在识别任务中的表现,还分析了其Viterbi对齐结果。研究发现:尽管转移概率训练并未提升识别性能,却能显著改善对齐质量。实验表明,所生成的对齐结果可作为先进Viterbi训练中的有效目标标注。