Causal discovery from observational data is a very challenging, often impossible, task. However, estimating the causal structure is possible under certain assumptions on the data-generating process. Many commonly used methods rely on the additivity of the noise in the structural equation models. Additivity implies that the variance or the tail of the effect, given the causes, is invariant; the cause only affects the mean. However, the tail or other characteristics of the random variable can provide different information about the causal structure. Such cases have received only very little attention in the literature. It has been shown that the causal graph is identifiable under different models, such as linear non-Gaussian, post-nonlinear, or quadratic variance functional models. We introduce a new class of models called the Conditional Parametric Causal Models (CPCM), where the cause affects the effect in some of the characteristics of interest. We use sufficient statistics to show the identifiability of the CPCM models in the exponential family of conditional distributions. We also propose an algorithm for estimating the causal structure from a random sample under CPCM. Its empirical properties are studied for various data sets, including an application on the expenditure behavior of residents of the Philippines.
翻译:从观测数据中发现因果关系是一项极具挑战性、往往不可行的任务。然而,在对数据生成过程施加特定假设的条件下,因果结构的估计是可能的。许多常用方法依赖于结构方程模型中噪声的加性假设。加性意味着效应的方差或尾部(给定原因时)保持不变;原因仅影响均值。然而,随机变量的尾部或其他特征可能提供关于因果结构的不同信息。这类情况在文献中受到的关注非常有限。已有研究表明,因果图在不同模型下是可识别的,例如线性非高斯模型、后非线性模型或二次方差函数模型。我们引入了一类新的模型,称为条件参数化因果模型(CPCM),在该模型中,原因以某些感兴趣的特征影响效应。我们利用充分统计量证明了指数族条件分布中CPCM模型的可识别性。我们还提出了一种在CPCM下从随机样本中估计因果结构的算法。该算法的实证性质已在多个数据集上进行了研究,包括对菲律宾居民支出行为的应用。