Heterogeneous data from multiple populations, sub-groups, or sources is often represented as a ``mixture model'' with a single latent class influencing all of the observed covariates. Heterogeneity can be resolved at multiple levels by grouping populations according to different notions of similarity. This paper proposes grouping with respect to the causal response of an intervention or perturbation on the system. This definition is distinct from previous notions, such as similar covariate values (e.g. clustering) or similar correlations between covariates (e.g. Gaussian mixture models). To solve the problem, we ``synthetically sample'' from a counterfactual distribution using higher-order multi-linear moments of the observable data. To understand how these ``causal mixtures'' fit in with more classical notions, we develop a hierarchy of mixture identifiability.
翻译:来自多个总体、子群或源的异质性数据常被表示为一种“混合模型”,其中单一潜在类别影响所有观测协变量。通过根据不同相似性概念对总体进行分组,异质性可在多个层面上得以解析。本文提出基于系统在干预或扰动下的因果响应进行分组。该定义区别于以往概念,如相似协变量值(例如聚类)或协变量间相似相关性(例如高斯混合模型)。为解决该问题,我们利用观测数据的高阶多线性矩从反事实分布中进行“合成抽样”。为理解这些“因果混合”如何与更经典概念相衔接,我们建立了混合可识别性的层级体系。