The latent class model has been proposed as a powerful tool for cluster analysis of categorical data in various fields such as social, psychological, behavioral, and biological sciences. However, one important limitation of the latent class model is that it is only suitable for data with binary responses, making it fail to model real-world data with continuous or negative responses. In many applications, ignoring the weights throws out a lot of potentially valuable information contained in the weights. To address this limitation, we propose a novel generative model, the weighted latent class model (WLCM). Our model allows data's response matrix to be generated from an arbitrary distribution with a latent class structure. In comparison to the latent class model, our WLCM is more realistic and more general. To our knowledge, our WLCM is the first model for latent class analysis with weighted responses. We investigate the identifiability of the model and propose an efficient algorithm for estimating the latent classes and other model parameters. We show that the proposed algorithm enjoys consistent estimation. The performance of the proposed algorithm is investigated using both computer-generated and real-world weighted response data.
翻译:潜在类别模型已被提出作为社会科学、心理学、行为科学和生物学等各领域中分类数据聚类分析的有力工具。然而,该模型的一个重要局限性在于其仅适用于二元响应数据,因此无法对包含连续或负响应的真实世界数据进行建模。在许多应用中,忽略权重会丢弃权重中包含的大量潜在有价值信息。为克服这一局限,我们提出了一种新型生成模型——加权潜在类别模型(WLCM)。该模型允许数据的响应矩阵在潜在类别结构下由任意分布生成。与传统潜在类别模型相比,我们的WLCM更具现实性和普适性。据我们所知,WLCM是首个针对加权响应进行潜在类别分析的模型。我们探究了模型的可识别性,并提出了一种用于估计潜在类别及其他模型参数的高效算法。理论分析表明该算法具有一致估计性质。我们通过计算机生成数据和真实加权响应数据验证了所提算法的性能。