Identifiability of discrete statistical models with latent variables is known to be challenging to study, yet crucial to a model's interpretability and reliability. This work presents a general algebraic technique to investigate identifiability of complicated discrete models with latent and graphical components. Specifically, motivated by diagnostic tests collecting multivariate categorical data, we focus on discrete models with multiple binary latent variables. In the considered model, the latent variables can have arbitrary dependencies among themselves while the latent-to-observed measurement graph takes a "star-forest" shape. We establish necessary and sufficient graphical criteria for identifiability, and reveal an interesting and perhaps surprising phenomenon of blessing-of-dependence geometry: under the minimal conditions for generic identifiability, the parameters are identifiable if and only if the latent variables are not statistically independent. Thanks to this theory, we can perform formal hypothesis tests of identifiability in the boundary case by testing certain marginal independence of the observed variables. Our results give new understanding of statistical properties of graphical models with latent variables. They also entail useful implications for designing diagnostic tests or surveys that measure binary latent traits.
翻译:含有潜变量的离散统计模型的可辨识性研究极具挑战性,但对于模型的解释性与可靠性至关重要。本文提出一种通用代数技术,用于研究含潜变量及图结构组件的复杂离散模型的可辨识性。具体而言,受收集多变量分类数据的诊断测试启发,我们聚焦于含有多个二元潜变量的离散模型。在该模型中,潜变量之间可存在任意依赖关系,而潜变量到观测变量的测量图呈“星形-森林”结构。我们建立了可辨识性的充要图准则,并揭示了一种有趣且可能令人惊讶的“依赖之福”几何现象:在通用可辨识性的最小条件下,参数可辨识当且仅当潜变量间并非统计独立。基于此理论,我们可通过检验观测变量的特定边际独立性,对边界情况下的可辨识性进行形式化假设检验。我们的研究结果深化了对含潜变量图模型统计特性的理解,并对设计测量二元潜特征的诊断测试或调查具有重要实践启示。