Haze and dust pollution events have significant adverse impacts on human health and ecosystems. Their formation-impact interactions are complex, creating substantial modeling and computational challenges for joint classification. To address the state-space explosion faced by conventional Hidden Markov Models in multivariate dynamic settings, this study develops a classification framework based on the Factorial Hidden Markov Model. The framework assumes statistical independence across multiple latent chains and applies the Walsh-Hadamard transform to reduce computational and memory costs. A Gaussian copula decouples marginal distributions from dependence to capture nonlinear correlations among meteorological and pollution indicators. Algorithmically, mutual information weights the observational variables to increase the sensitivity of Viterbi decoding to salient features, and a single global weight hyperparameter balances emission and transition contributions in the decoding objective. In an empirical application, the model attains a Micro-F1 of 0.9459; for the low-frequency classes Dust prevalence below 1\% and Haze prevalence below 10\%, the F1-scores improve from 0.19 and 0.32 under a baseline FHMM to 0.75 and 0.68. The framework provides a scalable pathway for statistical modeling of complex air-pollution events and supplies quantitative evidence for decision-making in outdoor activity management and fine-grained environmental governance.
翻译:雾霾与沙尘污染事件对人类健康和生态系统具有显著的不利影响。其形成与影响的相互作用复杂,给联合分类带来了巨大的建模与计算挑战。为应对传统隐马尔可夫模型在多变量动态场景下面临的状态空间爆炸问题,本研究开发了一种基于因子隐马尔可夫模型的分类框架。该框架假设多个潜在链之间统计独立,并应用沃尔什-哈达玛变换以降低计算和内存成本。高斯连接函数将边际分布与相依性解耦,以捕捉气象和污染指标间的非线性相关性。在算法层面,互信息对观测变量进行加权,以提高维特比解码对显著特征的敏感性;同时,一个全局权重超参数在解码目标中平衡发射贡献与转移贡献。在实证应用中,该模型的Micro-F1分数达到0.9459;对于出现频率低于1%的沙尘主导事件和低于10%的雾霾主导事件,其F1分数相较于基线FHMM模型的0.19和0.32,分别提升至0.75和0.68。该框架为复杂空气污染事件的统计建模提供了可扩展的路径,并为户外活动管理和精细化环境治理的决策提供了量化依据。