Discovering underlying structures of causal relations from observational studies poses a great challenge in scientific research where randomized trials or intervention-based studies are infeasible. This challenge pertains to the lack of knowledge on pre-specified roles of cause and effect in observations studies. Leveraging Shannon's seminal work on information theory, we propose a new conceptual framework of asymmetry where any causal link between putative cause and effect is captured by unequal information flows from one variable to another. We present an entropy-based asymmetry coefficient that not only enables us to assess for whether one variable is a stronger predictor of the other, but also detects an imprint of the underlying causal relation in observational studies. Our causal discovery analytics can accommodate low-dimensional confounders naturally. The proposed methodology relies on scalable non-parametric density estimation using fast Fourier transformation, making the resulting estimation method manyfold faster than the classical bandwidth-based density estimation while maintaining comparable mean integrated squared error rates. We investigate key asymptotic properties of our methodology and utilize a data-splitting and cross-fitting technique to facilitate inference for the direction of causal relations. We illustrate the performance of our methodology through simulation studies and real data examples.
翻译:从观测研究中发现因果关系的潜在结构是科学研究中的一项重大挑战,尤其是在随机试验或基于干预的研究不可行的情况下。这一挑战源于观测研究中缺乏对因果角色的事先指定。借鉴香农在信息论方面的开创性工作,我们提出了一种新的非对称性概念框架,其中假设的因果变量之间的任何因果联系都通过变量间的不等信息流来捕捉。我们提出了一种基于熵的非对称系数,该系数不仅使我们能够评估一个变量是否是另一个变量的更强预测因子,还能在观测研究中检测潜在因果关系的印记。我们的因果发现分析方法能够自然地适应低维混杂因素。所提出的方法依赖于使用快速傅里叶变换的可扩展非参数密度估计,使得最终估计方法比经典的基于带宽的密度估计快许多倍,同时保持可比的均方积分误差率。我们研究了该方法的关键渐近性质,并利用数据分割和交叉拟合技术来促进对因果方向的推断。我们通过模拟研究和真实数据示例展示了我们方法的性能。