Chronological age predictors often fail to achieve out-of-distribution (OOD) gen- eralization due to exogenous attributes such as race, gender, or tissue. Learning an invariant representation with respect to those attributes is therefore essential to improve OOD generalization and prevent overly optimistic results. In predic- tive settings, these attributes motivate bias mitigation; in causal analyses, they appear as confounders; and when protected, their suppression leads to fairness. We coherently explore these concepts with theoretical rigor and discuss the scope of an interpretable neural network model based on adversarial representation learning. Using publicly available mouse transcriptomic datasets, we illustrate the behavior of this model relative to conventional machine learning models. We observe that the outcome of this model is consistent with the predictive results of a published study demonstrating the effects of Elamipretide on mouse skeletal and cardiac muscle. We conclude by discussing the limitations of deriving causal interpretation from such purely predictive models.
翻译:时序年龄预测模型常因种族、性别或组织类型等外生属性而难以实现分布外(OOD)泛化。学习对这些属性具有不变性的表征,对于提升OOD泛化能力、防止过度乐观的预测结果至关重要。在预测任务中,这些属性催生了偏差缓解需求;在因果分析中,它们以混杂变量形式出现;当作为受保护属性时,对其抑制则导向公平性目标。我们以理论严谨性系统探讨这些概念的关联,并阐释基于对抗表征学习的可解释神经网络模型的应用范畴。通过公开的小鼠转录组数据集,我们对比展示了该模型相对于传统机器学习模型的行为特征。研究发现,该模型的预测结果与已发表研究中Elamipretide对小鼠骨骼肌和心肌作用效应的结论相一致。最后,我们讨论了从此类纯预测模型推导因果解释的局限性。