We describe libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. libhmm addresses two gaps in existing software: the absence of a well-maintained, zero-dependency C++ HMM library suitable for embedding in production systems, and the widespread use of method-of-moments (MOM) approximations in the emission distribution M-step of the Baum-Welch algorithm. The library implements correct maximum likelihood estimators for sixteen scalar emission distributions, including an ECME algorithm for the location-scale Student-t distribution, Newton-Raphson maximization for Gamma, Beta, Weibull, and Negative Binomial distributions, and the von Mises distribution for circular data. All forward-backward and Viterbi calculations operate in full log-space. SIMD acceleration is provided for AVX-512, AVX2, SSE2, and ARM NEON via compile-time dispatch with scalar fallback. Version 4 adds multivariate observation support via the BasicHmm<Obs> template, with three multivariate emission families (diagonal Gaussian, full-covariance Gaussian, and independent components) each with correct weighted MLE M-steps. Python bindings are available via the companion package pylibhmm. We compare libhmm against established C and C++ HMM libraries and against published R reference packages on seven real-data benchmarks, and discuss the architectural tradeoffs made in the design.
翻译:本文介绍libhmm,这是一个用于隐马尔可夫模型参数估计、序列解码和模型选择的C++20库。libhmm填补了现有软件中的两个空白:缺乏一个维护良好、零依赖的适用于嵌入生产系统的C++ HMM库,以及Baum-Welch算法发射分布M步中广泛使用的矩估计(MOM)近似。该库实现了十六种标量发射分布的正确最大似然估计器,包括用于位置-尺度学生t分布的ECME算法、用于Gamma、Beta、Weibull和负二项分布的Newton-Raphson最大化,以及用于循环数据的von Mises分布。所有前向-后向和Viterbi计算均在完整对数空间中运行。SIMD加速通过编译时分派支持AVX-512、AVX2、SSE2和ARM NEON,并带有标量回退。第4版通过BasicHmm<Obs>模板增加了多元观测支持,包含三个多元发射族(对角高斯、全协方差高斯和独立分量),每个均具有正确的加权MLE M步。通过配套包pylibhmm提供Python绑定。我们将libhmm与成熟的C和C++ HMM库以及已发表的R参考包在七个真实数据基准上进行比较,并讨论设计中做出的架构权衡。