Causal investigations in observational studies pose a great challenge in scientific research where randomized trials or intervention-based studies are not feasible. Leveraging Shannon's seminal work on information theory, we consider a framework of asymmetry where any causal link between putative cause and effect must be explained through a mechanism governing the cause as well as a generative process yielding an effect of the cause. Under weak assumptions, this framework enables the assessment of whether X is a stronger predictor of Y or vice-versa. Under stronger identifiability assumptions our framework is able to distinguish between cause and effect using observational data. We establish key statistical properties of this framework. Our proposed methodology relies on scalable non-parametric density estimation using fast Fourier transformation. The resulting estimation method is manyfold faster than the classical bandwidth-based density estimation while maintaining comparable mean integrated squared error rates. We investigate key asymptotic properties of our methodology and introduce a data-splitting technique to facilitate inference. The key attraction of our framework is its inference toolkit, which allows researchers to quantify uncertainty in causal discovery findings. We illustrate the performance of our methodology through simulation studies as well as multiple real data examples.
翻译:在观察性研究中开展因果调查是科学研究中的重大挑战,特别是在随机试验或干预性研究不可行的情况下。利用香农在信息理论方面的开创性工作,我们考虑了一个不对称性框架:其中假定原因与结果之间的任何因果联系都必须通过支配原因运行的机制以及产生结果效应的生成过程来解释。在弱假设条件下,该框架能够评估X是否为Y的更强预测因子,反之亦然。在更强的可识别性假设下,我们的框架能够利用观测数据区分原因与结果。我们建立了该框架的关键统计性质。所提出的方法依赖于使用快速傅里叶变换的可扩展非参数密度估计。与传统的基于带宽的密度估计相比,所提出的估计方法在保持可比的均方积分误差率的同时,计算速度提升了数倍。我们研究了该方法的关键渐近性质,并引入了一种数据分裂技术以支持推断。该框架的核心吸引力在于其推断工具包,使研究者能够量化因果发现中的不确定性。我们通过模拟研究和多个真实数据示例展示了该方法的性能。