Identifiability of discrete statistical models with latent variables is known to be challenging to study, yet crucial to a model's interpretability and reliability. This work presents a general algebraic technique to investigate identifiability of discrete models with latent and graphical components. Specifically, motivated by diagnostic tests collecting multivariate categorical data, we focus on discrete models with multiple binary latent variables. We consider the BLESS model in which the latent variables can have arbitrary dependencies among themselves while the latent-to-observed measurement graph takes a "star-forest" shape. We establish necessary and sufficient graphical criteria for identifiability, and reveal an interesting and perhaps surprising geometry of blessing-of-dependence: under the minimal conditions for generic identifiability, the parameters are identifiable if and only if the latent variables are not statistically independent. Thanks to this theory, we can perform formal hypothesis tests of identifiability in the boundary case by testing marginal independence of the observed variables. In addition to the BLESS model, we also use the technique to show identifiability and the blessing-of-dependence geometry for a more flexible model, which has a general measurement graph beyond a start forest. Our results give new understanding of statistical properties of graphical models with latent variables. They also entail useful implications for designing diagnostic tests or surveys that measure binary latent traits.
翻译:含潜变量的离散统计模型的可识别性研究历来具有挑战性,但对模型的可解释性与可靠性至关重要。本文提出一种通用代数技术,用于研究含潜变量及图结构组件的离散模型的可识别性。具体而言,受收集多元分类数据的诊断测试启发,我们聚焦于具有多个二元潜变量的离散模型。考虑潜变量间可任意依赖、而潜变量到观测变量的测量图呈"星形-森林"结构的BLESS模型。我们建立了可识别性的必要与充分图形判据,并揭示了一种有趣且可能令人惊讶的"依赖恩赐"几何特性:在保证一般可识别性的最小条件下,参数可识别当且仅当潜变量间统计不独立。基于该理论,我们可通过检验观测变量的边际独立性,对边界情形下的可识别性进行正式假设检验。除BLESS模型外,本文还运用该技术证明了更具灵活性的模型——其测量图突破星形森林结构——的可识别性及其"依赖恩赐"几何特性。研究结果深化了对含潜变量图模型统计特性的理解,并为设计测量二元潜特质的诊断测试或调查提供了实用启示。