Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch algorithm are computationally intensive and prone to local optima, while modern spectral algorithms offer provable guarantees but may produce probability outputs outside valid ranges. This work introduces Belief Net, a differentiable filtering framework that learns HMM parameters by formulating the forward filter as a structured neural network and optimizing it with stochastic gradient descent. This architecture recursively updates the belief state, which represents the posterior probability distribution over hidden states based on the observation history. Unlike black-box transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix, ensuring full interpretability. The model processes observation sequences using a decoder-only (causal) architecture and is trained end-to-end with standard autoregressive next-observation prediction loss. On synthetic HMM data, Belief Net achieves faster convergence than Baum-Welch while successfully recovering parameters in both undercomplete and overcomplete settings, whereas spectral methods prove ineffective in the latter. Comparisons with transformer-based models are also presented on real-world language data.
翻译:隐马尔可夫模型是建模序列数据的基石,但如何从观测中学习其参数仍具挑战性。经典方法如Baum-Welch算法计算量大且易陷入局部最优,而现代谱算法虽具有可证明的保证,但可能产生超出有效范围的概率输出。本文提出信念网络——一种可微滤波框架,通过将前向滤波构建为结构化神经网络并利用随机梯度下降优化来学习隐马尔可夫模型参数。该架构递归更新信念状态(基于观测历史对隐状态的后验概率分布)。与黑箱Transformer模型不同,信念网络的可学习权重明确对应初始分布、转移矩阵和发射矩阵的对数几率,从而确保完全可解释性。该模型采用仅解码器(因果)架构处理观测序列,并以标准自回归下一观测预测损失进行端到端训练。在合成隐马尔可夫数据上,信念网络在欠完备与过完备场景中均比Baum-Welch更快收敛,并成功恢复参数,而谱方法在过完备场景中失效。本文还展示了与基于Transformer的模型在真实语言数据上的对比结果。