On Functional Activations in Deep Neural Networks

Background: Deep neural networks have proven to be powerful computational tools for modeling, prediction, and generation. However, the workings of these models have generally been opaque. Recent work has shown that the performance of some models are modulated by overlapping functional networks of connections within the models. Here the techniques of functional neuroimaging are applied to an exemplary large language model to probe its functional structure. Methods: A series of block-designed task-based prompt sequences were generated to probe the Facebook Galactica-125M model. Tasks included prompts relating to political science, medical imaging, paleontology, archeology, pathology, and random strings presented in an off/on/off pattern with prompts about other random topics. For the generation of each output token, all layer output values were saved to create an effective time series. General linear models were fit to the data to identify layer output values which were active with the tasks. Results: Distinct, overlapping networks were identified with each task. Most overlap was observed between medical imaging and pathology networks. These networks were repeatable across repeated performance of related tasks, and correspondence of identified functional networks and activation in tasks not used to define the functional networks was shown to accurately identify the presented task. Conclusion: The techniques of functional neuroimaging can be applied to deep neural networks as a means to probe their workings. Identified functional networks hold the potential for use in model alignment, modulation of model output, and identifying weights to target in fine-tuning.

翻译：背景：深度神经网络已被证明是用于建模、预测和生成的强大计算工具。然而，这些模型的运行机制通常难以理解。近期研究表明，部分模型的性能受到模型内连接的功能性重叠网络调节。本文采用功能神经影像学技术，以代表性大型语言模型为例，探究其功能结构。方法：生成一系列基于区组设计的任务型提示序列，用于探测Facebook Galactica-125M模型。任务提示涉及政治学、医学影像、古生物学、考古学、病理学等主题，并采用关/开/关模式交替呈现随机字符串与其他随机主题提示。在生成每个输出令牌时，保存所有层输出值以形成有效时间序列。采用广义线性模型拟合数据，识别与任务相关的活跃层输出值。结果：每个任务均识别出不同但重叠的功能网络。医学影像与病理学网络之间的重叠最为显著。这些网络在重复执行相关任务时具有可重复性，且所识别功能网络与未用于定义功能网络的任务中的激活对应性可准确识别呈现的任务。结论：功能神经影像学技术可应用于深度神经网络以探究其运行机制。所识别的功能网络在模型对齐、输出调节及微调目标权重识别方面具有潜在应用价值。