Causal discovery from observational data is a very challenging, often impossible, task. However, estimating the causal structure is possible under certain assumptions on the data-generating process. Many commonly used methods rely on the additivity of the noise in the structural equation models. Additivity implies that the variance or the tail of the effect, given the causes, is invariant; the cause only affects the mean. In many applications, it is desirable to model the tail or other characteristics of the random variable since they can provide different information about the causal structure. However, models for causal inference in such cases have received only very little attention. It has been shown that the causal graph is identifiable under different models, such as linear non-Gaussian, post-nonlinear, or quadratic variance functional models. We introduce a new class of models called the Conditional Parametric Causal Models (CPCM), where the cause affects the effect in some of the characteristics of interest.We use the concept of sufficient statistics to show the identifiability of the CPCM models, focusing mostly on the exponential family of conditional distributions.We also propose an algorithm for estimating the causal structure from a random sample under CPCM. Its empirical properties are studied for various data sets, including an application on the expenditure behavior of residents of the Philippines.
翻译:从观测数据进行因果发现是一项极具挑战性、通常是不可能完成的任务。然而,在关于数据生成过程的特定假设下,估计因果结构是可行的。许多常用方法依赖于结构方程模型中噪声的可加性。可加性意味着给定原因时,效应的方差或尾部具有不变性;原因仅影响均值。在许多应用中,需要对随机变量的尾部或其他特征进行建模,因为它们能提供关于因果结构的不同信息。然而,针对此类情况的因果推断模型鲜有研究。已有研究表明,在不同模型(如线性非高斯模型、后非线性模型或二次方差函数模型)下,因果图是可识别的。我们提出了一类新模型,称为条件参数化因果模型(CPCM),其中原因会影响某些感兴趣特征中的效应。我们利用充分统计量的概念来证明CPCM模型的可识别性,主要关注条件分布的指数族。我们还提出了一种在CPCM下从随机样本中估计因果结构的算法。该算法的经验性质在不同数据集上进行了研究,包括一项关于菲律宾居民消费支出的应用。