High-dimensional categorical data arise in diverse scientific domains and are often accompanied by covariates. Latent class regression models are routinely used in such settings, reducing dimensionality by assuming conditional independence of the categorical variables given a single latent class that depends on covariates through a logistic regression model. However, such methods become unreliable as the dimensionality increases. To address this, we propose a flexible family of deep latent class models. Our model satisfies key theoretical properties, including identifiability and posterior consistency, and we establish a Bayes oracle clustering property that ensures robustness against the curse of dimensionality. We develop efficient posterior computation methods, validate them through simulation studies, and apply our model to joint species distribution modeling in ecology.
翻译:高维分类数据广泛出现于各科学领域,且常伴随协变量。潜在类别回归模型在此类场景中被常规使用,其通过假设分类变量在给定单一潜在类别条件下相互独立来降低维度,该潜在类别通过逻辑回归模型依赖于协变量。然而,随着维度增加,此类方法变得不可靠。为解决此问题,我们提出了一类灵活的深度潜在类别模型。我们的模型满足关键理论性质,包括可识别性与后验一致性,并建立了贝叶斯Oracle聚类性质,确保了对维度诅咒的稳健性。我们开发了高效的后验计算方法,通过模拟研究验证了其有效性,并将模型应用于生态学中的联合物种分布建模。