Despite recent advances in the field of explainability, much remains unknown about the algorithms that neural networks learn to represent. Recent work has attempted to understand trained models by decomposing them into functional circuits (Csord\'as et al., 2020; Lepori et al., 2023). To advance this research, we developed NeuroSurgeon, a python library that can be used to discover and manipulate subnetworks within models in the Huggingface Transformers library (Wolf et al., 2019). NeuroSurgeon is freely available at https://github.com/mlepori1/NeuroSurgeon.
翻译:尽管可解释性领域近期取得了进展,但神经网络所学习表征的算法仍有许多未知之处。近期研究尝试通过将训练好的模型分解为功能电路来理解它们(Csordás等人,2020;Lepori等人,2023)。为推进这一研究,我们开发了NeuroSurgeon——一个可用于发现和操作Huggingface Transformers库中模型子网络的Python库(Wolf等人,2019)。NeuroSurgeon可在https://github.com/mlepori1/NeuroSurgeon免费获取。