基于组合数据与狄利克雷传输的分类数据最优传输反事实分析 (Optimal Transport on Categorical Data for Counterfactuals using Compositional Data and Dirichlet Transport)

Recently, optimal transport-based approaches have gained attention for deriving counterfactuals, e.g., to quantify algorithmic discrimination. However, in the general multivariate setting, these methods are often opaque and difficult to interpret. To address this, alternative methodologies have been proposed, using causal graphs combined with iterative quantile regressions (Ple\v{c}ko and Meinshausen (2020)) or sequential transport (Fernandes Machado et al. (2025)) to examine fairness at the individual level, often referred to as ``counterfactual fairness.'' Despite these advancements, transporting categorical variables remains a significant challenge in practical applications with real datasets. In this paper, we propose a novel approach to address this issue. Our method involves (1) converting categorical variables into compositional data and (2) transporting these compositions within the probabilistic simplex of $\mathbb{R}^d$. We demonstrate the applicability and effectiveness of this approach through an illustration on real-world data, and discuss limitations.

翻译：近年来，基于最优传输的方法在推导反事实（例如量化算法歧视）方面受到关注。然而，在一般多变量设定下，这些方法通常不透明且难以解释。为解决此问题，已有研究提出替代方法，例如结合因果图与迭代分位数回归（Plečko and Meinshausen (2020)）或序列传输（Fernandes Machado et al. (2025)）来检验个体层面的公平性，常被称为“反事实公平性”。尽管取得这些进展，在实际应用的真实数据集中传输分类变量仍然是一个重大挑战。本文提出一种新方法以解决该问题。我们的方法包括：（1）将分类变量转换为组合数据；（2）在 $\mathbb{R}^d$ 的概率单纯形内传输这些组合。我们通过真实数据示例展示了该方法的适用性与有效性，并讨论了其局限性。