Reinterpreting causal discovery as the task of predicting unobserved joint statistics

from arxiv, 43 pages. This preprint is heavily based on arXiv:1804.03206, with many new thoughts and a better title. We wanted to keep the old one searchable under the old title

If $X,Y,Z$ denote sets of random variables, two different data sources may contain samples from $P_{X,Y}$ and $P_{Y,Z}$, respectively. We argue that causal discovery can help inferring properties of the `unobserved joint distributions' $P_{X,Y,Z}$ or $P_{X,Z}$. The properties may be conditional independences (as in `integrative causal inference') or also quantitative statements about dependences. More generally, we define a learning scenario where the input is a subset of variables and the label is some statistical property of that subset. Sets of jointly observed variables define the training points, while unobserved sets are possible test points. To solve this learning task, we infer, as an intermediate step, a causal model from the observations that then entails properties of unobserved sets. Accordingly, we can define the VC dimension of a class of causal models and derive generalization bounds for the predictions. Here, causal discovery becomes more modest and better accessible to empirical tests than usual: rather than trying to find a causal hypothesis that is `true' a causal hypothesis is {\it useful} whenever it correctly predicts statistical properties of unobserved joint distributions. This way, a sparse causal graph that omits weak influences may be more useful than a dense one (despite being less accurate) because it is able to reconstruct the full joint distribution from marginal distributions of smaller subsets. Within such a `pragmatic' application of causal discovery, some popular heuristic approaches become justified in retrospect. It is, for instance, allowed to infer DAGs from partial correlations instead of conditional independences if the DAGs are only used to predict partial correlations.

翻译：设$X,Y,Z$为随机变量集合，两个不同的数据源可能分别包含来自$P_{X,Y}$和$P_{Y,Z}$的样本。我们认为，因果发现有助于推断“未观测联合分布”$P_{X,Y,Z}$或$P_{X,Z}$的性质。这些性质可能是条件独立性（如“整合因果推断”中的情况），也可能是关于依赖关系的定量陈述。更一般地，我们定义了一个学习场景：输入为变量子集，标签为该子集的某种统计性质。联合观测的变量集定义了训练点，而未观测集则是可能的测试点。为解决这一学习任务，我们作为中间步骤从观测中推断出一个因果模型，该模型进而蕴含未观测集的性质。据此，我们可以定义一类因果模型的VC维，并推导预测的泛化界。在此，因果发现变得比通常更为谦逊且更易于经验检验：不再试图寻找“真实”的因果假设，而是只要因果假设能正确预测未观测联合分布的统计性质，它就是{\it 有用的}。这样一来，忽略弱影响的稀疏因果图可能比稠密图更有用（尽管精度较低），因为它能从更小子集的边际分布重构出完整联合分布。在这种“实用主义”的因果发现应用中，一些流行的启发式方法事后得到了辩护。例如，如果从部分相关系数推断的有向无环图仅用于预测部分相关系数，则允许使用部分相关系数而非条件独立性来推断DAG。