We address the problem of inferring the causal direction between a continuous variable $X$ and a discrete variable $Y$ from observational data. For the model $X \to Y$, we adopt the threshold model used in prior work. For the model $Y \to X$, we consider two cases: (1) the conditional distributions of $X$ given different values of $Y$ form a location-shift family, and (2) they are mixtures of generalized normal distributions with independently parameterized components. We establish identifiability of the causal direction through three theoretical results. First, we prove that under $X \to Y$, the density ratio of $X$ conditioned on different values of $Y$ is monotonic. Second, we establish that under $Y \to X$ with non-location-shift conditionals, monotonicity of the density ratio holds only on a set of Lebesgue measure zero in the parameter space. Third, we show that under $X \to Y$, the conditional distributions forming a location-shift family requires a precise coordination between the causal mechanism and input distribution, which is non-generic under the principle of independent mechanisms. Together, these results imply that monotonicity of the density ratio characterizes the direction $X \to Y$, whereas non-monotonicity or location-shift conditionals characterizes $Y \to X$. Based on this, we propose Density Ratio-based Causal Discovery (DRCD), a method that determines causal direction by testing for location-shift conditionals and monotonicity of the estimated density ratio. Experiments on synthetic and real-world datasets demonstrate that DRCD outperforms existing methods.
翻译:我们研究从观测数据推断连续变量$X$与离散变量$Y$间因果方向的问题。对于模型$X \to Y$,我们采用已有工作中使用的阈值模型。对于模型$Y \to X$,我们考虑两种情况:(1) 给定$Y$不同值时$X$的条件分布构成位置平移族;(2) 它们是具有独立参数化分量的广义正态分布的混合。我们通过三个理论结果建立了因果方向的可识别性。首先,我们证明在$X \to Y$下,给定$Y$不同值时$X$的密度比具有单调性。其次,我们证明在$Y \to X$且条件分布非位置平移的情况下,密度比的单调性仅在参数空间中一个勒贝格测度为零的集合上成立。第三,我们证明在$X \to Y$下,条件分布构成位置平移族要求因果机制与输入分布之间存在精确协调,这在独立机制原理下是非泛型的。这些结果共同表明:密度比的单调性表征了$X \to Y$方向,而非单调性或位置平移条件分布则表征$Y \to X$方向。基于此,我们提出基于密度比的因果发现方法(DRCD),该方法通过检验位置平移条件分布和估计密度比的单调性来确定因果方向。在合成数据集和真实数据集上的实验表明,DRCD优于现有方法。