StepMix is an open-source Python package for the pseudo-likelihood estimation (one-, two- and three-step approaches) of generalized finite mixture models (latent profile and latent class analysis) with external variables (covariates and distal outcomes). In many applications in social sciences, the main objective is not only to cluster individuals into latent classes, but also to use these classes to develop more complex statistical models. These models generally divide into a measurement model that relates the latent classes to observed indicators, and a structural model that relates covariates and outcome variables to the latent classes. The measurement and structural models can be estimated jointly using the so-called one-step approach or sequentially using stepwise methods, which present significant advantages for practitioners regarding the interpretability of the estimated latent classes. In addition to the one-step approach, StepMix implements the most important stepwise estimation methods from the literature, including the bias-adjusted three-step methods with Bolk-Croon-Hagenaars and maximum likelihood corrections and the more recent two-step approach. These pseudo-likelihood estimators are presented in this paper under a unified framework as specific expectation-maximization subroutines. To facilitate and promote their adoption among the data science community, StepMix follows the object-oriented design of the scikit-learn library and provides an additional R wrapper.
翻译:StepMix是一个开源Python包,用于含外部变量(协变量和远端结果)的广义有限混合模型(潜剖面分析和潜类别分析)的伪似然估计(一阶、二阶和三阶方法)。在社会科学领域的许多应用中,主要目标不仅是将个体聚类到潜类别中,还包括利用这些类别构建更复杂的统计模型。这类模型通常分为两部分:关联潜类别与观测指标的测量模型,以及关联协变量与结果变量至潜类别的结构模型。测量模型和结构模型可通过所谓的一步方法联合估计,或通过逐步方法序贯估计,后者在估计出的潜类别可解释性方面对实践者具有显著优势。除了一步方法外,StepMix还实现了文献中最重要的逐步估计方法,包括基于Bolk-Croon-Hagenaars校正和最大似然校正的偏差校正三阶方法,以及较新的二阶方法。本文在统一框架下将这些伪似然估计量呈现为特定的期望最大化子程序。为促进其在数据科学界的采纳与应用,StepMix遵循scikit-learn库的面向对象设计,并提供了额外的R语言接口。