We consider how human-centered causal theories and tools from the dynamical systems literature can be deployed to guide the representation of data when training neural networks for complex classification tasks. Specifically, we use simulated data to show that training a neural network with a data representation that makes explicit the invariant structural causal features of the data generating process of an epidemic system improves out-of-distribution (OOD) generalization performance on a classification task as compared to a more naive approach to data representation. We take these results to demonstrate that using human-generated causal knowledge to reduce the epistemic uncertainty of ML developers can lead to more well-specified ML pipelines. This, in turn, points to the utility of a dynamical systems approach to the broader effort aimed at improving the robustness and safety of machine learning systems via improved ML system development practices.
翻译:我们探讨如何将以人为中心的因果理论与动力系统文献中的工具应用于指导复杂分类任务中神经网络训练的数据表示。具体而言,我们使用模拟数据表明,采用一种显式捕捉流行病系统数据生成过程中不变结构性因果特征的数据表示来训练神经网络,相较于更为朴素的数据表示方法,能够提升分类任务的分布外泛化性能。这些结果证明,利用人类生成的因果知识降低机器学习开发者的认知不确定性,有助于构建更规范的机器学习流水线。这进而揭示了动力系统方法在通过改进机器学习系统开发实践以提升系统鲁棒性与安全性这一广泛努力中的实用价值。