Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods.
翻译:基于评分的方法通过评估不同因果结构对数据的拟合优度,在发现因果关系方面已证明其有效性。最近,Huang等人提出了一种广义评分函数,该方法通过在再生核希尔伯特空间中对关系进行建模,能够处理一般的数据分布和因果关系。在该评分函数中选择合适的核对于准确刻画因果关系并确保精确的因果发现至关重要。然而,当前方法涉及手动启发式选择核参数,使得该过程繁琐且难以保证最优性。本文提出一种广义评分函数内的核选择方法,能够自动选择最契合数据的最优核。具体而言,我们将因果图搜索过程中每一步所涉及变量的生成过程建模为独立噪声变量的混合。基于该模型,我们通过最大化每一步搜索中涉及变量的边际似然,推导出一种自动核选择方法。我们在合成数据和真实世界基准数据上进行了实验,结果表明我们提出的方法优于启发式核选择方法。