Evaluating Brain-Inspired Modular Training in Automated Circuit Discovery for Mechanistic Interpretability

Large Language Models (LLMs) have experienced a rapid rise in AI, changing a wide range of applications with their advanced capabilities. As these models become increasingly integral to decision-making, the need for thorough interpretability has never been more critical. Mechanistic Interpretability offers a pathway to this understanding by identifying and analyzing specific sub-networks or 'circuits' within these complex systems. A crucial aspect of this approach is Automated Circuit Discovery, which facilitates the study of large models like GPT4 or LLAMA in a feasible manner. In this context, our research evaluates a recent method, Brain-Inspired Modular Training (BIMT), designed to enhance the interpretability of neural networks. We demonstrate how BIMT significantly improves the efficiency and quality of Automated Circuit Discovery, overcoming the limitations of manual methods. Our comparative analysis further reveals that BIMT outperforms existing models in terms of circuit quality, discovery time, and sparsity. Additionally, we provide a comprehensive computational analysis of BIMT, including aspects such as training duration, memory allocation requirements, and inference speed. This study advances the larger objective of creating trustworthy and transparent AI systems in addition to demonstrating how well BIMT works to make neural networks easier to understand.

翻译：大语言模型（LLMs）在人工智能领域经历了迅猛发展，凭借其先进能力改变了众多应用场景。随着这些模型日益成为决策过程中不可或缺的组成部分，对其进行彻底的可解释性研究已变得前所未有的重要。机械可解释性通过识别和分析这些复杂系统中的特定子网络或"电路"，为理解模型运作提供了途径。该方法的核心理念之一是自动化电路发现技术，它使得以可行方式研究GPT4或LLAMA等大型模型成为可能。在此背景下，本研究评估了近期提出的脑启发式模块化训练（BIMT）方法——该方法旨在提升神经网络的可解释性。我们证明BIMT能显著提高自动化电路发现的效率与质量，突破了人工方法的局限性。比较分析进一步揭示，BIMT在电路质量、发现时间和稀疏性方面均优于现有模型。此外，我们对BIMT进行了全面的计算分析，涵盖训练时长、内存分配需求及推理速度等维度。本研究不仅验证了BIMT在简化神经网络理解方面的有效性，更推动了构建可信赖、透明化AI系统这一宏大目标的实现。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日