This paper introduces semantic features as a general conceptual framework for fully explainable neural network layers. A well-motivated proof of concept model for relevant subproblem of MNIST consists of 4 such layers with the total of 4.8K learnable parameters. The model is easily interpretable, achieves human-level adversarial test accuracy with no form of adversarial training, requires little hyperparameter tuning and can be quickly trained on a single CPU. The general nature of the technique bears promise for a paradigm shift towards radically democratised and truly generalizable white box neural networks. The code is available at https://github.com/314-Foundation/white-box-nn
翻译:本文引入语义特征作为完全可解释神经网络层的一般性概念框架。针对MNIST相关子问题,一个动机良好的概念验证模型由4个这样的层组成,总共包含4.8K个可学习参数。该模型易于解释,在没有任何对抗训练的情况下实现了人类水平的对抗测试准确率,几乎不需要超参数调整,并且可以在单个CPU上快速训练。该技术的通用性质有望带来范式转变,推动实现根本性民主化和真正可泛化的白盒神经网络。代码可在https://github.com/314-Foundation/white-box-nn获取。