This paper introduces semantic features as a candidate conceptual framework for white-box neural networks. A proof of concept model for informative subproblem of MNIST consists of 4 such layers with the total of 5K learnable parameters. The model is well-motivated, inherently interpretable, requires little hyperparameter tuning and achieves almost human-level adversarial test metrics - with no form of adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn
翻译:本文提出将语义特征作为白盒神经网络的概念框架。针对MNIST信息性子问题,我们构建了一个由4层语义特征层组成的概念验证模型,总可学习参数量为5K。该模型动机明确、本质可解释,几乎无需超参数调优,并在未采用任何对抗训练的情况下,取得了接近人类水平的对抗测试指标!这些结果及其方法的普适性,为语义特征的进一步研究提供了有力支撑。代码开源地址:https://github.com/314-Foundation/white-box-nn