Causal investigations in observational studies pose a great challenge in scientific research where randomized trials or intervention-based studies are not feasible. Leveraging Shannon's seminal work on information theory, we consider a framework of asymmetry where any causal link between putative cause and effect must be explained through a mechanism governing the cause as well as a generative process yielding an effect of the cause. Under weak assumptions, this framework enables the assessment of whether X is a stronger predictor of Y or vice-versa. Under stronger identifiability assumptions our framework is able to distinguish between cause and effect using observational data. We establish key statistical properties of this framework. Our proposed methodology relies on scalable non-parametric density estimation using fast Fourier transformation. The resulting estimation method is manyfold faster than the classical bandwidth-based density estimation while maintaining comparable mean integrated squared error rates. We investigate key asymptotic properties of our methodology and introduce a data-splitting technique to facilitate inference. The key attraction of our framework is its inference toolkit, which allows researchers to quantify uncertainty in causal discovery findings. We illustrate the performance of our methodology through simulation studies as well as multiple real data examples.
翻译:在观察性研究中开展因果推断对科学研究构成巨大挑战,特别是在随机试验或基于干预的研究不可行的情况下。本文借鉴香农在信息论领域的开创性工作,构建了一个非对称性分析框架——假设原因与结果之间的任何因果联系必须通过支配原因的内在机制和生成结果的过程共同解释。在弱假设条件下,该框架可评估变量X是否为Y的更强预测因子(反之亦然);在更强的可辨识性假设下,该框架能利用观测数据区分因果方向。我们建立了该框架的关键统计性质。所提出的方法论采用快速傅里叶变换进行可扩展的非参数密度估计,其估计速度比经典带宽密度估计方法快数个数量级,同时保持可比较的均方积分误差率。我们研究了方法论的关键渐近性质,并引入数据分割技术以支持统计推断。该框架的核心优势在于其推理工具集,使研究人员能够量化因果发现结果的不确定性。通过模拟研究与多个真实数据案例,我们验证了方法论的有效性。