Depicting deterministic variables within directed acyclic graphs (DAGs): An aid for identifying and interpreting causal effects involving tautological associations, compositional data, and composite variables

2023 年 2 月 3 日

翻译：确定变量在有向无环图（DAG）中的表示：辅助识别与解读涉及同义关联、成分数据及复合变量的因果效应

Laurie Berrie,Kellyn F. Arnold,Georgia D. Tomova,Mark S. Gilthorpe,Peter W. G. Tennant

from arxiv, 18 pages, 5 figures

Deterministic variables are variables that are fully explained by one or more parent variables. They commonly arise when a variable has been algebraically constructed from one or more parent variables, as with composite variables, and in compositional data, where the 'whole' variable is determined from its 'parts'. This article introduces how deterministic variables may be depicted within directed acyclic graphs (DAGs) to help with identifying and interpreting causal effects involving tautological associations, compositional data, and composite variables. We propose a two-step approach in which all variables are initially considered, and an explicit choice is then made whether to focus on the deterministic variable(s) or the determining parents. Depicting deterministic variables within DAGs bring several benefits. It is easier to identify and avoid misinterpreting tautological associations, i.e., self-fulfilling associations between variables with shared algebraic parent variables. In compositional data, it is easier to understand the consequences of conditioning on the 'whole' variable, and correctly identify total and relative causal effects. For composite variables, it encourages greater consideration of the target estimand and greater scrutiny of the consistency and exchangeability assumptions. DAGs with deterministic variables are a useful aid for planning and interpreting analyses involving tautological associations, compositional data, and/or composite variables.

翻译：确定变量是指完全由一个或多个父变量解释的变量。当变量通过代数方式由一个或多个父变量构建而成（如复合变量）时，以及在成分数据中（其中“整体”变量由其“部分”决定）时，这类变量会频繁出现。本文介绍如何在有向无环图（DAG）中表示确定变量，以帮助识别和解读涉及同义关联、成分数据及复合变量的因果效应。我们提出一种两步法：首先考虑所有变量，然后明确选择聚焦于确定变量或其决定性的父变量。在DAG中表示确定变量具有若干优势：更易识别并避免误读同义关联（即共享代数父变量之间的自实现关联）；在成分数据中，更易理解以“整体”变量为条件的结果，并正确识别总因果效应和相对因果效应；对于复合变量，则有助于更充分考目标估计量，并更严格审查一致性与可交换性假设。包含确定变量的DAG是规划与解读涉及同义关联、成分数据及/或复合变量的分析的有效辅助工具。