纵向因果发现在真实世界工作流约束下的操作化 (Operationalizing Longitudinal Causal Discovery Under Real-World Workflow Constraints)

Causal discovery has achieved substantial theoretical progress, yet its deployment in large-scale longitudinal systems remains limited. A key obstacle is that operational data are generated under institutional workflows whose induced partial orders are rarely formalized, enlarging the admissible graph space in ways inconsistent with the recording process. We characterize a workflow-induced constraint class for longitudinal causal discovery that restricts the admissible directed acyclic graph space through protocol-derived structural masks and timeline-aligned indexing. Rather than introducing a new optimization algorithm, we show that explicitly encoding workflow-consistent partial orders reduces structural ambiguity, especially in mixed discrete--continuous panels where within-time orientation is weakly identified. The framework combines workflow-derived admissible-edge constraints, measurement-aligned time indexing and block structure, bootstrap-based uncertainty quantification for lagged total effects, and a dynamic representation supporting intervention queries. In a nationwide annual health screening cohort in Japan with 107,261 individuals and 429,044 person-years, workflow-constrained longitudinal LiNGAM yields temporally consistent within-time substructures and interpretable lagged total effects with explicit uncertainty. Sensitivity analyses using alternative exposure and body-composition definitions preserve the main qualitative patterns. We argue that formalizing workflow-derived constraint classes improves structural interpretability without relying on domain-specific edge specification, providing a reproducible bridge between operational workflows and longitudinal causal discovery under standard identifiability assumptions.

翻译：因果发现已取得显著的理论进展，但其在大规模纵向系统中的部署仍然有限。一个关键障碍在于，操作数据是在机构工作流下生成的，这些工作流所诱导的偏序关系很少被形式化，从而以与记录过程不一致的方式扩大了可容许图空间。我们为纵向因果发现刻画了一类工作流诱导的约束，其通过协议衍生的结构掩码和时间线对齐的索引来限制可容许的有向无环图空间。我们并未引入新的优化算法，而是证明了显式编码工作流一致的偏序关系可以减少结构模糊性，尤其是在离散-连续混合面板数据中，其时点内方向性识别较弱。该框架结合了工作流衍生的可容许边约束、测量对齐的时间索引与块结构、基于自助法的滞后总效应不确定性量化，以及支持干预查询的动态表示。在一个包含107,261名个体和429,044人年的日本全国年度健康筛查队列中，工作流约束的纵向LiNGAM产生了时间一致的时点内子结构，以及具有明确不确定性的可解释的滞后总效应。使用替代暴露和身体成分定义进行的敏感性分析保留了主要的定性模式。我们认为，形式化工作流衍生的约束类可在不依赖领域特定边指定的情况下提高结构可解释性，为标准可识别性假设下的操作工作流与纵向因果发现之间提供了可复现的桥梁。