High-dimensional categorical data arise in diverse scientific domains and are often accompanied by covariates. Latent class regression models are routinely used in such settings, reducing dimensionality by assuming conditional independence of the categorical variables given a single latent class that depends on covariates through a logistic regression model. However, such methods become unreliable as the dimensionality increases. To address this, we propose Bayesian latent class regression with interpretable binary profiles (BLIP), a flexible family of models that introduces a binary latent-attribute layer between the covariate-dependent latent class and the observed categorical responses. BLIP satisfies key theoretical properties, including identifiability and posterior consistency, and we establish a Bayes oracle clustering property that ensures robustness against the curse of dimensionality. We develop efficient posterior computation methods, validate them through simulation studies, and use BLIP to infer regions of common profile in ecological data.
翻译:暂无翻译