Determining causal relationship between high dimensional observations are among the most important tasks in scientific discoveries. In this paper, we revisited the \emph{linear trace method}, a technique proposed in~\citep{janzing2009telling,zscheischler2011testing} to infer the causal direction between two random variables of high dimensions. We strengthen the existing results significantly by providing an improved tail analysis in addition to extending the results to nonlinear trace functionals with sharper confidence bounds under certain distributional assumptions. We obtain our results by interpreting the trace estimator in the causal regime as a function over random orthogonal matrices, where the concentration of Lipschitz functions over such space could be applied. We additionally propose a novel ridge-regularized variant of the estimator in \cite{zscheischler2011testing}, and give provable bounds relating the ridge-estimated terms to their ground-truth counterparts. We support our theoretical results with encouraging experiments on synthetic datasets, more prominently, under high-dimension low sample size regime.
翻译:判断高维观测数据之间的因果关系是科学发现中最核心的任务之一。本文重新审视了线性迹方法(linear trace method),该技术由~\citep{janzing2009telling,zscheischler2011testing}提出,用于推断高维随机变量间的因果方向。我们通过改进尾部分析,以及在特定分布假设下将结果推广至具有更优置信界的非线性迹泛函,显著强化了现有结论。我们的推导基于因果机制中迹估计量可解释为随机正交矩阵函数的性质,进而可应用该类空间上Lipschitz函数的集中性理论。此外,我们提出了~\cite{zscheischler2011testing}中估计量的新型岭正则化变体,并给出岭估计项与其真实值之间可证明的边界关系。我们在合成数据集上的实验验证了理论结果,尤其在高维低样本量场景下表现突出。