Deep learning has revolutionized human society, yet the black-box nature of deep neural networks hinders further application to reliability-demanded industries. In the attempt to unpack them, many works observe or impact internal variables to improve the comprehensibility and invertibility of the black-box models. However, existing methods rely on intuitive assumptions and lack mathematical guarantees. To bridge this gap, we introduce Bort, an optimizer for improving model explainability with boundedness and orthogonality constraints on model parameters, derived from the sufficient conditions of model comprehensibility and invertibility. We perform reconstruction and backtracking on the model representations optimized by Bort and observe a clear improvement in model explainability. Based on Bort, we are able to synthesize explainable adversarial samples without additional parameters and training. Surprisingly, we find Bort constantly improves the classification accuracy of various architectures including ResNet and DeiT on MNIST, CIFAR-10, and ImageNet. Code: https://github.com/zbr17/Bort.
翻译:深度学习已深刻改变了人类社会,然而深度神经网络的黑箱特性阻碍了其在可靠性要求较高的行业中的进一步应用。为破解这一困局,诸多研究通过观测或影响内部变量来提升黑箱模型的可理解性与可逆性。然而,现有方法多依赖直觉假设且缺乏数学理论保障。为弥合这一差距,我们提出Bort优化器,通过对模型参数施加有界性与正交性约束(该约束源自模型可理解性与可逆性的充分条件),提升模型可解释性。基于Bort优化后的模型表征进行重构与回溯追踪,我们观察到模型可解释性显著提升。借助Bort,我们能够在无需额外参数与训练的情况下合成可解释对抗样本。令人惊喜的是,我们发现Bort能持续提升包括ResNet和DeiT在内的多种架构在MNIST、CIFAR-10和ImageNet数据集上的分类准确率。代码:https://github.com/zbr17/Bort。