I introduce a unified framework for interpreting neural network classifiers tailored toward automated scientific discovery. In contrast to neural network-based regression, for classification, it is in general impossible to find a one-to-one mapping from the neural network to a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. In this paper, I embed a trained neural network into an equivalence class of classifying functions that base their decisions on the same quantity. I interpret neural networks by finding an intersection between this equivalence class and human-readable equations defined by the search space of symbolic regression. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces or to simplify the process of interpreting neural network regressors.
翻译:我提出一个面向自动化科学发现的统一框架,用于解释神经网络分类器。与基于神经网络的回归不同,对于分类任务而言,即使神经网络自身的分类决策基于可写成闭式方程的数值量,通常也无法建立从神经网络到符号方程的一一映射。本文将训练好的神经网络嵌入一个分类函数的等价类,这些函数基于同一数值量做出决策。我通过寻找该等价类与符号回归搜索空间所定义的可读方程之间的交集来对神经网络进行解释。该方法不仅适用于分类器或完整神经网络,还可应用于隐藏层或潜在空间中的任意神经元,或简化神经网络回归器的解释过程。