Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives

The trustworthiness of machine learning has emerged as a critical topic in the field, encompassing various applications and research areas such as robustness, security, interpretability, and fairness. The last decade saw the development of numerous methods addressing these challenges. In this survey, we systematically review these advancements from a data-centric perspective, highlighting the shortcomings of traditional empirical risk minimization (ERM) training in handling challenges posed by the data. Interestingly, we observe a convergence of these methods, despite being developed independently across trustworthy machine learning subfields. Pearl's hierarchy of causality offers a unifying framework for these techniques. Accordingly, this survey presents the background of trustworthy machine learning development using a unified set of concepts, connects this language to Pearl's causal hierarchy, and finally discusses methods explicitly inspired by causality literature. We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness, fostering a more cohesive understanding of the field. Further, we explore the trustworthiness of large pretrained models. After summarizing dominant techniques like fine-tuning, parameter-efficient fine-tuning, prompting, and reinforcement learning with human feedback, we draw connections between them and the standard ERM. This connection allows us to build upon the principled understanding of trustworthy methods, extending it to these new techniques in large pretrained models, paving the way for future methods. Existing methods under this perspective are also reviewed. Lastly, we offer a brief summary of the applications of these methods and discuss potential future aspects related to our survey. For more information, please visit http://trustai.one.

翻译：机器学习可信赖性已成为该领域的关键议题，涵盖鲁棒性、安全性、可解释性和公平性等应用与研究领域。过去十年间，学术界针对这些挑战提出了众多方法。本综述从数据中心视角系统梳理这些进展，指出现代经验风险最小化训练在处理数据带来的挑战时存在的局限性。有趣的是，尽管这些方法在可信机器学习子领域中独立发展，我们观察到其具有收敛趋势。Pearl因果层级为这些技术提供了统一框架。据此，本文首先通过统一概念体系阐述可信机器学习发展背景，将这种语言范式与Pearl因果层级相连接，继而讨论受因果文献启发的显式方法。我们使用具有数学词汇的统一语言，将鲁棒性、对抗鲁棒性、可解释性和公平性领域的方法关联起来，促进对该领域更融贯的理解。此外，我们探讨大型预训练模型的可信赖性。在总结微调、参数高效微调、提示学习及基于人类反馈的强化学习等主流技术后，揭示其与标准ERM的内在联系。这种联系使我们能基于对可信方法的原理性理解，将其延伸至大型预训练模型的新技术，为未来方法铺平道路。本文亦评述了该视角下的现有方法。最后，简要总结这些方法的应用情境，并讨论与本研究相关的未来方向。更多信息请访问 http://trustai.one。