On Championing Foundation Models: From Explainability to Interpretability

Understanding the inner mechanisms of black-box foundation models (FMs) is essential yet challenging in artificial intelligence and its applications. Over the last decade, the long-running focus has been on their explainability, leading to the development of post-hoc explainable methods to rationalize the specific decisions already made by black-box FMs. However, these explainable methods have certain limitations in terms of faithfulness, detail capture and resource requirement. Consequently, in response to these issues, a new class of interpretable methods should be considered to unveil the underlying mechanisms in an accurate, comprehensive, heuristic and resource-light way. This survey aims to review interpretable methods that comply with the aforementioned principles and have been successfully applied to FMs. These methods are deeply rooted in machine learning theory, covering the analysis of generalization performance, expressive capability, and dynamic behavior. They provide a thorough interpretation of the entire workflow of FMs, ranging from the inference capability and training dynamics to their ethical implications. Ultimately, drawing upon these interpretations, this review identifies the next frontier research directions for FMs.

翻译：理解黑盒基础模型（FMs）的内部机制在人工智能及其应用中至关重要，但也极具挑战性。过去十年间，长期的研究焦点一直集中于其可解释性，这推动了事后可解释方法的发展，旨在合理化黑盒FMs已做出的具体决策。然而，这些可解释方法在忠实度、细节捕捉和资源需求方面存在一定局限性。因此，针对这些问题，应考虑采用一类新的可诠释方法，以准确、全面、启发式且资源消耗低的方式揭示其底层机制。本综述旨在回顾符合上述原则并已成功应用于FMs的可诠释方法。这些方法根植于机器学习理论，涵盖泛化性能、表达能力及动态行为的分析。它们为FMs的整个工作流程提供了全面的诠释，范围从推理能力、训练动态到其伦理影响。最终，基于这些诠释，本文指出了FMs未来的前沿研究方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/