StepMix is an open-source software package for the pseudo-likelihood estimation (one-, two- and three-step approaches) of generalized finite mixture models (latent profile and latent class analysis) with external variables (covariates and distal outcomes). In many applications in social sciences, the main objective is not only to cluster individuals into latent classes, but also to use these classes to develop more complex statistical models. These models generally divide into a measurement model that relates the latent classes to observed indicators, and a structural model that relates covariates and outcome variables to the latent classes. The measurement and structural models can be estimated jointly using the so-called one-step approach or sequentially using stepwise methods, which present significant advantages for practitioners regarding the interpretability of the estimated latent classes. In addition to the one-step approach, StepMix implements the most important stepwise estimation methods from the literature, including the bias-adjusted three-step methods with BCH and ML corrections and the more recent two-step approach. These pseudo-likelihood estimators are presented in this paper under a unified framework as specific expectation-maximization subroutines. To facilitate and promote their adoption among the data science community, StepMix follows the object-oriented design of the scikit-learn library and provides interfaces in both Python and R.
翻译:StepMix是一款开源软件包,用于含外生变量(协变量与远端结果)的广义有限混合模型(包括潜在剖面分析与潜在类别分析)的伪似然估计(单步、两步及三步法)。在社会科学诸多应用中,研究目标不仅在于将个体聚类为潜在类别,更在于利用这些类别构建更复杂的统计模型。这类模型通常包含两部分:测量模型(关联潜在类别与观测指标)与结构模型(关联协变量及结果变量与潜在类别)。测量模型与结构模型可通过所谓单步法联合估计,亦可采用逐步法顺序估计,后者在估计所得潜在类别的可解释性方面为实践者带来显著优势。除单步法外,StepMix还实现了文献中最重要的逐步估计方法,包括带BCH与ML校正的偏差调整三步法及近期提出的两步法。本文将这些伪似然估计量统一归纳为特定期望最大化子例程框架。为促进其在数据科学社区的普及应用,StepMix沿用了scikit-learn库的面向对象设计,并提供Python与R双语言接口。